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ABSTRACT 


This  study  compares  the  performance  of  executing  Prolog  code  on  the  Berke¬ 
ley  PLM  processor  (a  special-purpose  CISC  architecture)  and  the  Berkeley 
SPUR  processor  (a  general-purpose  RISC  architecture  with  tagged  data). 
Fourteen  standard  benchmark  programs  were  run  on  both  the  PLM  and 
SPUR  simulators.  The  two  implementations  were  compared  with  regard  to 
static  and  dynamic  program  size,  execution  speed,  and  cache  performance. 
The  simulated  memory  system  included  a  direct-mapped  mixed  instruction 
and  data  cache.  We  found  that,  on  average,  the  macro-coded  SPUR  imple¬ 
mentation  has  a  static  code  size  14  times  larger  than  the  PLM,  executes  16 
times  more  instructions,  yet  requires  only  2.31  times  the  number  of  machine 
cycles.  To  have  the  same  miss  ratio  with  a  much  larger  code  size  the  SPUR 
implementation  requires  a  cache  that  is  4  to  8  times  that  of  the  PLM.  We 
also  suggest  minor  changes  to  the  SPUR  instruction  set  to  improve  its  Prolog 
execution  and  outline  the  design  of  a  special-purpose  SPUR  coprocessor  that 
would  greatly  reduce  the  code  size  and  double  SPUR’s  Prolog  performance. 
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1.  Introduction 

Logic  programming  has  been  the  subject  of  much  attention  since  the  1981 
announcement  that  the  Japanese  Fifth  Generation  Project  [Moto-oka83]  would 
use  Prolog  as  its  principal  programming  language.  Many  computer  researchers  in 
the  artificial  intelligence  community  believe  that  logic  programming  provides  a 
more  direct  and  natural  mapping  of  problem  specifications  into  machine  language 
than  traditional  high-level  languages.  In  the  past  few  years,  much  work  has  been 
done  to  develop  high-performance  machines  for  logic  programming. 

Prolog  is  the  most  popular  of  the  logic  programming  languages.  It  was 
designed  at  the  University  of  Marseille  around  1970  by  Alain  Colmerauer  and  his 
associates.  In  1977,  David  Warren  at  the  University  of  Edinburgh  developed  the 
first  compiled  implementation  of  Prolog  [Warren77].  The  compiler  ran  on  a 
DECsystem-10  and  was  dramatically  faster  than  previous,  interpreted  implemen¬ 
tations.  Since  then,  many  approaches,  ranging  from  advanced  compiler  tech¬ 
niques  to  microcoded  hardware  enhancements,  have  been  used  to  improve  the  per¬ 
formance  of  compiled  Prolog  code. 

Most  compiled  implementations  of  Prolog  are  based  on  refinements  to 
Warren’s  original  abstract  machine  (WAM)  specification  [Warren83].  Warren’s 
instruction  set  corresponds  very  closely  to  the  tokens  of  the  Prolog  language. 
Compiling  from  Prolog  to  WAM  is  therefore  a  simple  and  straight-forward  process 
that  can  reasonably  be  implemented  in  Prolog  itself. 

The  Berkeley  Prolog  Machine  (PLM)  is  a  special-purpose  microcoded  proces¬ 
sor  that  uses  a  slightly  modified  version  of  the  WAM  instruction  set  [Dobry84c]. 
Efficient  Prolog  execution  is  achieved  through  the  much  higher  code  density  of  the 
PLM  as  compared  to  conventional,  general-purpose  architectures.  The  PLM  is 
expected  to  run  Prolog  ten  times  faster  than  the  compiled  implementation  for  the 
DEC-2060.  The  PLM  is  part  of  the  larger  Berkeley  Aquarius  project  whose  aim 
is  to  build  a  16  processor  Prolog  multiprocessor  with  a  shared  synchronization 
memory  [Dobry85j. 

The  Berkeley  SPUR  (Symbolic  Processing  Using  RISCs)  project  aims  to  pro¬ 
duce  a  multiprocessor  personal  workstation  for  high-performance  general-purpose 
processing  with  some  support  for  Lisp  and  floating-point  computation  [Hill86]. 
The  SPUR  microprocessor  is  a  reduced  instruction  set  computer  (RISC)  with 
extensions  for  tagged  data  types  and  a  large  mixed  instruction  and  data  cache.  It 
includes  a  tightly-coupled  coprocessor  interface.  The  first  coprocessor  to  be 
implemented  will  be  used  for  high-performance  IEEE  standard  floating-point 
operations. 

Our  objective  is  to  show  how  well  Prolog  programs  can  be  executed  on 
SPUR,  a  processor  not  designed  with  logic  programming  applications  in  mind.  We 
did  not  compile  Prolog  directly  into  SPUR  machine  code  but  instead  used  the  out¬ 
put  of  the  PLM  compiler  and  performed  a  macro-expansion  of  the  PLM/WAM 
instructions  into  SPUR  instructions.  Improvements  in  performance  could 
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certainly  be  gained  by  building  a  Prolog  compiler  for  the  SPUR  architecture.  We 
chose  to  use  a  macro-expansion  technique  so  as  to  save  time  (there  was  no  Prolog 
implementation  for  SPUR)  and  also  to  better  compare  the  two  architectures 
rather  than  the  difference  between  two  compilers.  We  feel  we  have  achieved  our 
objective  of  finding  a  lower  bound  for  Prolog  execution  using  macro-expansion 
with  a  few  straight-forward  optimizations. 

If  Prolog  runs  efficiently  on  SPUR,  then  Prolog  programs  can  be  easily 
integrated  with  an  operating  system,  floating  point  hardware,  and  other  applica¬ 
tions  programs  to  create  a  test-bed  for  experiments  in  mixed-paradigm  program¬ 
ming  systems.  Although  both  SPUR  and  PLM  are  the  basic  elements  of  larger 
multiprocessor  machines,  we  did  not  consider  the  issues  of  parallelism  inherent  in 
Prolog  on  either  of  these  two  architectures. 

The  next  two  sections  provide  background  information  on  the  PLM  and 
SPUR  architectures.  We  then  discuss  how  PLM  instructions  were  translated  into 
SPUR  instructions  using  macro-expansion.  Section  4  considers  tradeoffs  with 
respect  to  register  allocation,  stack  usage,  and  the  prolog  unification  operations  in 
mapping  PLM  to  SPUR.  Section  5  presents  our  performance  comparisons  results 
including  static  and  dynamic  program  size,  execution  speed,  and  memory  cache 
effects.  We  conclude  with  suggestions  to  improve  Prolog  performance  with  slight 
modifications  to  the  SPUR  architecture  or  with  the  use  of  a  special  coprocessor. 

The  appendices  contain  the  programs  that  perform  the  macro-expansions,  the 
macro-expansions,  and  the  details  of  a  possible  implementation  of  a  Prolog  copro¬ 
cessor  for  SPUR. 
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2.  The  PLM  Architecture 

The  Berkeley  PLM  is  a  TTL  implementation  based  on  the  Warren  Abstract 
Machine,  the  target  machine  of  the  first  Prolog  compiler  [Warren77],  Warren 
implemented  a  compiler  that  translates  Prolog  into  abstract  machine  instructions 
that  were  then  macro-expanded  into  DEC- 10  machine  code.  Previous  Prolog 
implementations  were  interpreters  that  were  usually  written  in  Lisp,  and  therefore 
suffered  from  the  inefficiency  of  being  translated  twice,  once  from  Prolog  to  Lisp 
and  again  from  Lisp  to  machine  code. 

Warren’s  abstract  machine  is  the  basis  of  most  of  the  work  being  done  on 
special-purpose  Prolog  hardware.  This  section  describes  WAM  as  modified  by 
Tick  and  Warren  [Tick83,  Warren83]  and  by  the  Berkeley  PLM  group  [Dobry84b, 
Fagin85].  Dobry  gives  detailed  descriptions  of  the  PLM  in  [Dobry84a,  Dobry84c, 
Dobry85].  Clocksin  and  Mellish  [Clocksin81]  provide  a  good  foundation  for  the 
basics  of  Prolog  and  logic  programming. 

2.1.  PLM  Data  Types  and  Representation 

The  PLM  supports  four  (tagged)  data  types:  constants,  variables,  lists,  and 
structures.  Constants  have  a  minor  type  of  nil,  integer,  atom,  or  floating  point. 
Variables,  actually  pointers  to  other  data  structures,  are  bound  (point  to  a  data 
cell)  or  unbound  (point  to  themselves).  Lists  and  structures  are  cdr-coded  to  elim¬ 
inate  pointers  to  successive  locations  (car  and  cdr  cells  are  distinguished  by  using 
a  tag  bit).  This  technique  can  eliminate  up  to  half  the  amount  of  memory  neces¬ 
sary  to  represent  lists  and  structures  and  many  pointer  dereferences  when  a  the 
elements  of  a  list  can  be  kept  in  a  contiguous  group  of  memory  locations.  Figure 
1  summarizes  the  PLM  data  types  and  illustrates  their  layouts.  It  is  important  to 
note  that  the  PLM  tags  consist  of  three  orthogonal  fields:  type,  sub-type,  and  cdr 
(or  bound)  bit.  All  tags  also  have  a  bit  used  by  a  garbage  collection  algorithm. 

2.2.  PLM  Registers  and  Data  Structures 

The  PLM  organizes  memory  into 

•  a  code  segment; 

•  five  stacks,  consisting  of  the  environment  stack,  choice  point  stack,  “heap”, 
trail  stack,  and  the  “push  down  list”; 

•  sixteen  data  registers; 

•  a  few  mode  bits. 

These  features  are  described  below. 

2.2.1.  Environment  Stack 

The  environment  stack  contains  activation  records  of  active  Prolog  clauses. 
An  activation  record  consists  of  a  pointer  to  the  previous  record,  a  code  space 
address  to  jump  to  should  the  clause  succeed,  the  clause  argument  count,  room  to 
store  the  clause’s  local  variables,  and  a  pointer  to  the  last  choice  point  entry 
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Bound  Variable 


Unbound  Variable 


Structure 


List  Head 


List  Continuation 


NIL 


Constant 


Figure  1.  PLM  data  types.  Tags  consist  of  a  two-bit  field  for  the  type  of  data: 
variable,  structure,  list,  or  constant.  Variables  contain  a  one-bit  field  indicating 
whether  they  are  bound  (and  point  to  another  data  cell)  or  unbound  (and  point  to 
themselves).  This  bit  is  always  zero  in  structure  pointers.  In  list  pointers  this 
field  distinguishes  car  from  cdr  cells  in  the  cdr-coded  data  structure  supported  by 
the  PLM.  This  bit  is  also  set  to  indicate  a  constant  is  nil.  Constants  also  have 
another  two-bit  subtype  field.  All  tags  have  a  bit  used  by  a  garbage  collection  al¬ 
gorithm. 

should  the  clause  use  the  cut  operator.  (The  cut  operator  increases  backtracking 

efficiency  by  pruning  branches  from  the  depth-first  search  tree.) 


2.2.2.  Choice  Point  Stack 

This  stack  contains  procedure  choice  points.  A  choice  point  is  a  set  of  15 
PLM  registers  containing  the  necessary  state  to  backtrack  to  a  previous  node  in 
the  search  tree.  It  consists  of  the  original  procedure  arguments,  a  pointer  to  the 
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environment  of  the  calling  procedure,  a  pointer  to  the  top  of  the  heap  at  the  time 
the  procedure  was  invoked,  a  pointer  to  the  top  of  the  trail  stack  at  procedure 
invocation  time,  a  code  space  address  should  the  procedure  succeed,  a  code  space 
address  should  backtracking  be  necessary,  and  a  pointer  to  the  previous  choice 
point  (this  is  needed  since  the  PLM  interleaves  the  choice  point  stack  and  the 
environment  stack  in  the  same  stack  segment). 

2.2.3.  Heap 

Space  for  dynamically  constructed  lists  and  structures  is  sequentially  allo¬ 
cated  from  the  heap,  which  behaves  very  much  like  a  stack.  Heap  space  is 
reclaimed  when  a  procedure  fails,  but  must  be  garbage-collected  periodically  since 
data  structures  created  by  successful  clauses  would  not  otherwise  be  reclaimed. 

2.2.4.  Trail  Stack 

As  Prolog  programs  perform  their  pattern  matching  functions,  variables  in 
one  data  structure  are  “bound”  (i.e.  made  to  point)  to  the  corresponding  element 
of  the  second  data  structure.  In  order  to  restore  the  computation’s  state  when  a 
clause  fails  (pattern  matching  fails)  and  backtracking  occurs,  all  variable  bindings 
must  be  reversible.  The  trail  contains  pointers  to  variables  on  the  heap  that  have 
become  bound  during  procedure  execution.  On  goal  failure,  all  trail  entries  above 
the  saved  heap  pointer  stored  in  the  choice  point  are  read  and  the  variables  they 
point  to  are  unbound. 

2.2.5.  Push  Down  List 

The  push  down  list  is  a  high-speed  stack  inside  the  PLM  and  is  used  to  store 
pointers  into  two  data  structures  that  are  being  unified  (i.e.  pattern  matched). 
Since  lists  or  structures  can  contain  embedded  lists  or  structures  as  elements, 
unification  is  a  recursive  process.  List  or  structures  are  unified  in  a  depth-first, 
post-order  tree  traversal.  The  push  down  list  is  an  optimization  to  reduce  the 
complexity  of  managing  another  stack  in  memory  and  to  increase  unification  per¬ 
formance. 

2.2.0.  Registers 

Finally,  the  PLM  has  the  following  special-purpose  registers:  a  program 
counter  register  (P);  a  goal  success  program  counter  (CP);  pointers  to  the  top  of 
the  environment,  choice  point,  trail,  and  heap  stacks  (E,  B,  Tr,  and  H  respec¬ 
tively);  a  structure  pointer  register  (S);  the  procedure  argument  count  register  (N); 
eight  argument  registers  (AX1-AX8);  and  two  bits  of  control  state  called  the  cut 
bit  and  the  read/write  mode  bit. 

2.3.  PLM  Instructions 

PLM  instructions  fall  into  eight  classes: 
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•  procedure  control  instructions  that  manipulate  choice  points; 

•  indexing  instructions  that  perform  multi-way  branches  depending  on  the  type 
and  value  of  an  argument  register; 

•  clause  control  instructions  that  manipulate  activation  records  on  the  environ¬ 
ment  stack; 

•  get  and  put  instructions  that  verify  and  prepare  goal  arguments  respectively; 
and 

•  unify  instructions  that  construct  and  compare  structures  and  lists  one  element 
at  a  time  by  pattern  matching  the  corresponding  elements. 

Instruction  lengths  range  from  1  to  6  bytes.  Tailoring  the  instruction  set  to 
the  language  produces  high  code  density  for  the  PLM.  Smaller  program  sizes 
result  in  much  improved  instruction  buffer  and  cache  performance.  This  will  be 
shown  to  be  the  primary  reason  for  the  PLM’s  excellent  performance.  However, 
instruction  fetching  and  decoding  is  greatly  complicated  due  to  the  variable-length 
unaligned  instruction  format. 

The  PLM’s  instruction  buffer  performs  the  bulk  of  the  instruction  prefetching 
and  decoding  functions.  Instructions  are  broken  up  into  an  8-bit  opcode  field,  a 
32-bit  first  argument  field,  and  for  instructions  of  more  than  one  argument  there 
is  an  additional  32-bit  field  containing  the  two  single-byte  second  and  third  argu¬ 
ments.  These  are  presented  to  the  central  processor  for  final  decoding  and  execu¬ 
tion.  There  is  only  a  small  set  of  conditional  branch  instructions  in  the  PLM. 
The  instruction  buffer  stops  prefetching  when  one  of  these  instructions  is  encoun¬ 
tered  and  simply  waits  for  the  branch  to  be  resolved.  The  PLM  instruction  buffer 
typically  contains  from  five  to  sixteen  instructions. 

2.4.  Stack  Allocation  Optimizations  for  Prolog 

Warren  included  two  more  memory  saving  optimizations  in  his  original 
implementation:  environment  trimming  and  tail  recursion  elimination.  Environ¬ 
ment  trimming  frees  space  from  the  activation  record  of  a  clause  as  soon  as  a 
variable  is  no  longer  referenced.  This  requires  an  additional  field  to  the  call 
instruction  so  the  size  of  the  activation  record  can  be  updated  as  each  subclause 
of  a  clause  is  invoked.  Larger  memory  sizes  have  probably  made  this  optimization 
unnecessary. 

Tail  recursion  elimination  discards  the  activation  record  of  a  clause  before 
the  invocation  of  the  last  subclause.  This  optimization  is  quite  valuable  in  recur¬ 
sive  procedures  that  would  otherwise  quickly  fill  up  stack  space  with  unused 
activation  records.  The  only  restriction  imposed  by  this  method  is  that  recursive 
clauses  should  be  purely  tail  recursive.  This  condition  is  usually  true  in  practice 
and  can  be  enforced  in  almost  every  case.  However,  this  important  optimization 
does  require  a  special  PLM  instruction  to  move  needed  variables  from  the  activa¬ 
tion  record  to  the  heap.  The  activation  record’s  registers  can  replace  those  of  the 
parent  clause  since  failure  of  the  last  recursive  clause  implies  a  failure  of  the 
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parent  clause,  popping  both  records  off  the  stack. 

2.5.  System  Support  Functions 

PLM  instructions  are  complex  since  they  execute  in  an  indeterminate  amount 
of  time  (e.g.  recursive  unification)  and  they  can  generate  an  indeterminate  number 
of  memory  references  per  instruction.  The  first  point  makes  instructions  difficult 
to  restart.  A  large  amount  of  micro-engine  state  must  be  preserved  between 
instructions.  The  second  point  implies  that  page  faults  may  occur  during  the  exe¬ 
cution  of  a  long  instruction.  In  the  current  architecture,  the  PLM  is  a  coprocessor 
to  an  NCR-32  main  processor  that  handles  memory  management  and  process 
scheduling  functions  for  the  PLM.  It  is  important  to  note  that  the  PLM  cannot 
be  easily  context-switched  to  another  Prolog  process  while  a  page  request  is  being 
serviced. 

The  NCR  host  processor  not  only  provides  virtual  memory  support  but  also 
performs  I/O  system  calls  and  floating  point  operations  for  PLM  escape  instruc¬ 
tions.  However,  these  operations  are  expensive  since  they  must  be  performed  by  a 
loosely-coupled  coprocessor  reached  through  the  system  bus  rather  than  a  tightly- 
coupled  coprocessor  reached  through  a  direct  interface. 
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3.  The  SPUR  Architecture 

The  SPUR  (Symbolic  Processing  Using  RISCs)  architecture  is  a  RISC  archi¬ 
tecture  augmented  with  special  support  for  LISP  processing  and  floating-point 
computation.  The  added  capabilities  include  tagged  data  types  and  a  tightly- 
coupled  coprocessor  interface.  SPUR  has  been  designed  as  a  multiprocessor 
workstation.  It  has  a  128KB  cache  that  maintains  data  coherency  by  using 
hardware  support  for  bus  snooping.  SPUR  extends  the  work  of  the  earlier  Berke¬ 
ley  RISC  and  SOAR  architectures  [Katevenis83,  Ungar84].  A  summary  of  SPUR 
and  PLM  features  is  listed  in  Table  1. 

3.1.  SPUR  Registers  and  Tags 

The  basic  instruction  and  data  word  is  32  bits,  however,  registers  are  40  bits 
wide  so  as  to  support  an  8-bit  data  tag.  Data  is  always  word-aligned.  Tagged 
data  is  stored  in  2  words  containing  a  total  of  64  bits:  the  first  word  is  the  data 
and  the  second  is  the  tag.  Although  the  SPUR  system  bus  supports  only  32-bit 
transfers  to  the  processor  cache,  all  other  busses  are  40  bits  wide.  Therefore,  a 
penalty  for  tagged  data  transfers  is  only  incurred  when  data  is  brought  into  the 
cache  or  written  back  out  to  memory. 


Comparison  of  PLM  and  SPUR 

Features 

PLM 

SPUR 

architecture 

tagged  CISC 

tagged  RISC 

target  languages 

Prolog  only 

LISP,  C  &  others 

instruction  size 

1  to  6  bytes 

4  bytes 

cycles  per  instr. 

(no  misses)  f 

I  to  26+ 

1  (2  for  stores) 

4+  for  floating-point  ops 

avg.  cycles/instr. 

7 

1 

cycle  time 

100-150  ns 

100-140  ns 

registers 

9  special 

8  argument 

138  GPRs  in  8  overlapped  windows 
(10  global,  6  input,  10  local,  and 

6  output  per  window) 

cache 

separate  I&D,  16KB  each 

mixed  I&D,  128KB 

instr.  buffer  size 

5  to  16  instructions 

128  instructions 

microcode  size 

IK  x  134  bits 

not  applicable 

Table  1.  Summary  of  features  of  the  PLM  and  SPUR  architectures, 
f  This  assumes  a  perfect  memory  (i.e.  no  cache  or  instruction  buffer  misses)  for 
both  PLM  and  SPUR.  For  SPUR,  load  instructions  are  assumed  to  be  followed 
by  a  non-dependent  instructions  (as  is  the  case  in  our  implementation).  SPUR 
store  inductions  stall  the  pipeline  and  hence  require  two  cycles. 
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There  are  138  general-purpose  registers  in  SPUR  organized  into  8  overlapped 
windows  of  32  registers.  Of  these  32,  there  are  10  global  registers  accessible  in  all 
windows,  6  input  registers  that  overlap  with  the  previous  window,  10  local  regis¬ 
ters  that  are  accessible  in  only  one  window,  and  6  outputs  that  overlap  with  the 
next  window.  When  a  function  call  is  executed,  the  CPU  shifts  its  current  win¬ 
dow.  Input  parameters  and  output  results  are  passed  between  functions  by  using 
the  overlapped  registers.  All  registers  are  general-purpose  except  one  of  the  glo¬ 
bal  registers  that  is  hard-wired  to  a  zero  value. 

Special  trap  conditions  are  signaled  when  the  windows  overflow  (i.e.  a  func¬ 
tion  call  depth  of  more  than  8  is  reached)  or  underflow  and  a  trap  handler  is  used 
to  move  registers  to  and  from  memory.  When  a  window  is  available,  function 
calls  proceed  very  quickly,  only  when  windows  overflow  or  underflow  are  memory 
accesses  to  preserve  register  state  necessary.  Halbert  and  Kessler  have  shown  that 
window  overflows  occur  less  1%  of  the  time  for  a  window  size  of  8  [Halbert80]. 


3.2.  SPUR  Instructions 

Instructions  fall  into  three  basic  types.  Most  instructions  are  register-to- 
register  operations  involving  the  entire  40-bit  quantity.  Special  exceptions  are  sig¬ 
naled  depending  on  the  value  of  the  tags  (e.g.  adding  two  pointers).  There  are 
also  instructions  that  cause  operations  to  be  performed  on  coprocessor  registers 
(e.g.  floating-point  add). 

The  load-to-register  and  store-from-register  instructions  are  the  only  instruc¬ 
tions  that  access  memory.  There  are  instructions  for  making  either  32-bit 
untagged  data  and  instruction  access  as  well  as  full  40-bit  tagged  access.  Special¬ 
ized  instructions  are  provided  for  moving  data  between  the  cache  and  the  copro¬ 
cessor  registers  in  sizes  corresponding  to  the  IEEE  floating-point  standard  and  for 
transfers  between  processor  and  coprocessor  registers.  Lastly,  there  are  com¬ 
parison  and  branch  instructions  that  alter  control  flow  depending  on  the  value  of 
a  CPU  register,  coprocessor  register,  or  data  tag. 

SPUR  is  equipped  with  an  on-chip  instruction  buffer  of  128  words.  Instruc¬ 
tions  are  prefetched  but  compete  with  load  and  store  instructions  for  access  to  the 
mixed  instruction  and  data  cache.  All  instructions  are  aligned  and  have  uniform 
argument  format,  consequently,  no  alignment  or  predecoding  is  necessary.  As  a 
result  the  instruction  buffer  is  a  straight-forward  on-chip  instruction  cache. 


3.3.  Pipeline  Execution 

SPUR  has  a  four-stage  pipeline:  instruction  fetch,  register  read  and  an  ALU 
operation  either  to  combine  operands  or  compute  and  effective  address,  memory 
access  (for  load  and  store  instructions)  or  nothing  (for  register-to-register  instruc¬ 
tions),  and  register  write.  The  pipeline  includes  circuitry  for  forwarding  data  that 
may  be  required  by  the  next  instruction  and  not  yet  stored  back  into  a  register. 
The  effective  throughput  is  therefore  one  instruction  per  cycle. 
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Due  to  the  structure  of  the  pipeline,  the  effects  of  load  and  branch  instruc¬ 
tions  are  delayed  one  cycle  beyond  their  execution.  The  contents  of  a  register 
that  is  loaded  from  the  cache  cannot  be  accessed  by  the  next  instruction.  It  is 
necessary  to  include  some  operations  that  do  not  use  that  register  or  a  NOP  in 
the  slot  immediately  following  the  load  instruction.  A  similar  situation  holds  for 
branch  instructions.  The  instruction  immediately  following  a  branch  will  be  exe¬ 
cuted  whether  or  not  the  branch  is  taken.  SPUR  provides  special  branch  instruc¬ 
tions  that  cancel  the  effects  of  the  subsequent  instruction  when  the  branch  is 
taken.  However,  carefully  placement  of  instructions  in  the  slot  following  a  branch 
can  greatly  improve  throughput. 

3.4.  SPUR  Address  Spaces 

SPUR  has  a  38-bit  global  virtual  address  space  and  32-bit  process  virtual 
address  space.  The  high-order  two  bits  of  the  address  select  one  of  four  segments 
and  the  remaining  30  bits  are  an  offset  into  the  segment.  Typically,  the  first  seg¬ 
ment  contains  the  operating  system  and  the  other  three  contain  the  process  s 
code,  data,  and  stack  segments,  respectively.  During  address  translation,  the  2  bit 
segment  number  is  used  to  index  a  set  of  four  segment  registers  each  of  which 
contains  an  8-bit  global  segment  number  that  selects  one  of  256  segments.  The 
8-bit  segment  number  is  concatenated  with  the  30-bit  offset  to  form  the  38-bit  glo¬ 
bal  virtual  address. 

3.5.  Coprocessor  Interface 

SPUR  supports  a  tightly-coupled  coprocessor  model.  The  CPU  initiates  all 
data  transfers  between  the  coprocessor  registers  and  the  cache  using  a  64-bit  wide 
data  bus.  All  instruction  dispatching  is  performed  by  the  CPU.  When  the  CPU 
fetches  an  instruction  that  is  not  for  the  main  processor  it  forwards  it  to  the 
coprocessor. 

The  coprocessor  interface  consists  of  a  set  of  lines  for  communicating  the 
coprocessor  instruction  opcode  and  register  arguments.  One  control  signal  indi¬ 
cates  to  the  coprocessor  that  a  new  instruction  is  being  presented  by  the  CPU  and 
another  indicates  the  coprocessor  has  completed  execution.  When  the  coprocessor 
requires  more  than  one  cycle  for  instruction  execution  it  suspends  the  CPU  pipe¬ 
line  by  asserting  a  coprocessor  busy  signal. 

Coprocessors  can  also  operate  in  parallel  with  the  CPU.  .All  responsibility  for 
waiting  the  appropriate  amount  of  time  for  results  to  be  available  rests  with  the 
CPU.  An  extra  bit  in  the  interface  is  used  to  select  one  of  two  coprocessors  that 
could  both  be  used  in  parallel. 

3.8.  System  Support  Functions 

SPUR  handles  all  its  own  traps  and  page  faults.  For  this  reason,  unlike  the 
PLM,  all  instructions  are  restartable  and  atomic  in  their  operation.  The  SPUR 
processor  has  all  the  general-purpose  instructions  and  trap  handling  capability  to 
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directly  support  an  operating  system  The  PLM  would  require  additional  micro¬ 
code  support  these  functions. 
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4.  Implementing  Prolog  on  SPUR 

This  section  describes  the  mechanics  of  macro-expanding  Prolog  into  SPUR 
assembly  language  and  how  the  state  of  the  PLM  is  mapped  onto  the  SPUR  regis¬ 
ters. 


4.1.  Macro-expanding  PLM  on  SPUR 

We  chose  macro-expanding  PLM  instructions  rather  than  writing  a  full  Pro¬ 
log  compiler  for  SPUR  because  its  simplicity  enabled  us  to  find  a  lower  bound  on 
SPUR  Prolog  performance  in  just  a  few  weeks.  Undoubtedly,  a  compiler  would 
achieve  much  better  performance.  In  this  section,  we  review  the  design  alterna¬ 
tives  we  considered  to  represent  the  PLM  s  state  and  describe  the  design  we  chose 
to  implement.  We  also  describe  the  tools  we  developed  to  automatically  macro- 
expand  PLM  instructions  to  SPUR  code. 

4.1.1.  Choice  Points,  Environments,  and  Registers 

Many  of  the  PLM  registers  point  into  its  multiple  data  and  activation  record 
stacks.  We  chose  a  register  allocation  scheme  that  follows  the  tenet  that  the 
optimal  register  layout  is  the  one  that  reduces  the  processor-memory  bandwidth. 
Since  choice  points  are  much  larger  than  environment  activation  records,  we 
decided  to  exploit  SPUR’s  register  windows  for  choice  point  buffering.  Register 
windows  cannot  be  used  to  represent  the  environment  and  heap  structures  since  it 
must  be  possible  to  bind  to  these  structures  and  it  would  be  extremely  complex  to 
bind  to  registers  (registers  don’t  have  a  memory  address).  Using  them  as  a  trail 
buffer  is  difficult  since  trail  entries  are  single  words  and  really  require  a  hardware 
stack  rather  than  SPUR’s  overlapping  register  windows.  Also,  since  access  to  the 
trail  is  strictly  LIFO,  any  buffering  scheme  would  only  eliminate  a  single  store  and 
a  single  load  per  entry.  The  Berkeley  PLM  research  group  estimated  that  trail 
buffering  in  hardware  could  at  best  yield  a  1%  performance  gain  [Dobry84c|. 

Our  register  usage  is  shown  in  Table  2.  From  this  table  one  can  see  that 
there  is  a  close  match  between  the  size  of  a  choice  point  and  the  size  of  SPUR 
register  window.  Each  window  keep  the  argument  registers  local  and  overlaps 
state  registers  with  the  preceding  and  following  windows  (choice  points).  This  is 
precisely  the  type  of  behavior  required,  access  to  the  previous  choice  point  is  also 
required  in  the  PLM.  Choice  point  buffering  with  register  windows  reduces  the 
instruction  data  fetch  collisions  on  the  try,  try.  me.  else,  retry,  and  retry,  me.  else 
instructions.  Rather  than  interfering  with  data  fetches,  the  contents  of  choice 
point  registers  can  be  obtained  from  internal  processor  registers  rather  than  a 
stack  frame  in  memory.  Backtracking  is  accomplished  with  register  move  opera¬ 
tions  and  the  shifting  of  register  windows.  Choice  point  buffering  appears  to  be 
the  only  natural  use  for  SPUR’s  register  windows. 

The  equivalent  of  the  PLM  B  register  in  the  SPUR  implementation  no  longer 
contains  a  pointer  into  memory  but  now  points  to  a  register  window.  This 
number  is  incremented  when  a  choice  point  is  pushed  and  decremented  when  a 
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SPUR  Register  Allocation  lor  PLM 

Type 

Register 

Use 

Globals 

0 

hardwired  0 

1-8 

PLM  AX1-AX8 

9 

Pointer  to  constant  table 

Inputs 

Previous  Choice  Point 

10 

PLM  E  register 

11 

PLM  TR  register 

12 

PLM  H  register 

13 

PLM  B  register 

14 

PLM  BP  register 

15 

PLM  CP  register 

Locals 

Linkage  and  Temporaries 

16 

PLM  S  mode  flag 

17 

PLM  Cut  mode  flag 

18-25 

PLM  AX1-AX8  and  CP  when 
window  becomes  choice  point 

Outputs 

Current  Choice  Point 

26 

PLM  E  register 

27 

PLM  TR  register 

28 

PLM  H  register 

29 

PLM  B  register 

30 

PLM  BP  register 

31 

PLM  CP  register 

Table  2.  Register  allocation  for  PLM  registers  in  SPUR  register  windows.  Glo- 
bals  are  used  for  the  argument  registers  of  the  current  choice  point  and  for 
pointers  to  constant  tables.  When  a  choice  point  needs  to  pushed  onto  the  stack, 
the  argument  register  are  moved  into  the  local  registers  and  the  register  window 
shifted.  Otherwise  the  locals  are  used  for  temporaries  and  the  S  register,  which  is 
not  shared  across  choice  points.  The  overlap  of  the  input  and  output  registers 
makes  the  values  of  the  registers  of  the  previous  choice  point  available  to  the 
current  one. 

choice  point  is  popped.  The  PLM’s  S  register  is  a  local  temporary  register  since 
its  value  is  not  needed  across  choice  points.  The  ten  SPUR  local  registers  are  used 
as  temporaries  by  the  macro  expansion  code.  The  eight  argument  registers  are 
kept  in  global  registers.  This  leaves  only  a  single  register  (R9)  for  indexing  into  a 
constant  table.  This  originally  caused  us  to  suggest  that  more  global  SPUR  regis¬ 
ters  would  be  useful,  however,  a  level  of  indirection  solves  this  problem  at 
minimal  cost. 

Since  the  PLM  design  includes  only  a  single  choice  point  buffer,  we  anticipate 
better  performance  than  the  PLM  for  programs  that  push  and  pop  choice  points 
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often  but  not  too  deeply  (more  than  8).  Operations  that  push  choice  points 
account  for  10%  of  the  execution  time  of  PLM  and,  as  expected,  operations  that 
pop  choice  points  account  for  only  3%  of  its  operation  time.  This  asymmetry  is 
due  to  popping  multiple  choice  points  during  a  single  cut  instruction.  Efficient 
programs  perform  less  backtracking  and  would  be  expected  to  pop  more  than  one 
choice  point  in  a  single  operation;  a  3:1  push  to  pop  ratio  is  typical.  For  pro¬ 
grams  that  do  push  choice  points  deeply,  our  design  will  suffer  the  same  perfor¬ 
mance  degradation  as  the  PLM.  However,  this  type  of  behavior  is  unlikely  to  be 
a  part  of  most  programs. 

An  alternative  to  the  register  layout  that  we  chose  is  to  eliminate  the  argu¬ 
ment  registers  (AX)  and  store  the  arguments  in  the  procedure  activation  record  as 
done  in  other  programming  languages.  However,  this  requires  more  load  and 
store  instructions  that  would  add  to  the  instruction-data  fetch  bandwidth 
bottleneck.  The  advantage  of  this  method  is  that  it  frees  the  SPUR  global  regis¬ 
ters  for  other  uses.  Neither  approach  requires  modification  to  the  PLM  compiler 
and  their  true  merits  will  best  be  determined  by  simulation. 

4.1.2.  Register  Windows  and  the  Recursive  Unify  Operations 

Unfortunately,  the  large  size  of  SPUR’s  register  windows  reduces  their 
effectiveness  for  recursive  unification.  Recursive  unification  must  save  only  three 
registers  per  invocation.  Since  16  new  register  become  available  on  a  window 
shift,  this  leaves  13  registers  unused.  Unrolling  the  unify  code  five  times  would 
permit  the  use  of  15  of  the  16  registers.  This  seems  promising,  however,  the 
increased  code  size  could  be  detrimental  to  instruction  buffer  and  cache  perfor¬ 
mance.  In  addition  to  unrolling  the  unify  code,  there  are  at  least  two  other  possi¬ 
ble  approaches  to  implementing  recursive  unify:  implementing  the  recursion  stack 
in  memory  and  using  the  register  windows  directly  by  replacing  the 
overflow/underflow  trap  handler  to  save  and  restore  only  three  registers.  Yet 
another  possibility  is  to  have  the  SPUR  hardware  provide  two  window  sizes,  one 
as  currently  implemented  and  another  small  one  for  procedures  that  may  be 
highly  recursive  but  only  require  a  small  number  of  arguments.  We  chose  the 
simplest  path  and  implemented  our  own  recursion  stack  in  memory. 

4.1.3.  Macro-expanding  PLM  Instructions 

.Although  there  are  papers  that  describe  the  PLM  instruction  set,  the  PLM 
simulator  [Dobry84aj  is  the  only  place  that  accurately  describes  the  semantics  of 
all  PLM  instructions.  Our  approach  to  macro-expanding  PLM  to  SPUR  was  to 
use  the  model  of  PLM  functionality  provided  by  the  PLM  level  1  simulator.  The 
simulator  is  written  in  C  and  contains  a  separate  procedure  for  each  PLM  instruc¬ 
tion.  Some  of  the  procedures  share  common  subroutines.  Essentially,  we  hand- 
compiled  the  procedures  in  the  PLM  simulator  into  SPUR  assembly  language;  the 
functions  that  simulated  an  instruction  were  made  into  macros  and  the  common 
functions  were  put  into  a  subroutine  library  loaded  with  every  SPLTl  program. 
Since  we  tried  not  to  deviate  from  the  PLM  simulator,  we  employed  minimal 
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optimizations.  The  only  optimizations  were  to  use  the  SPUR  tags  to  simulate  the 
PLM  tags  and  register  windows  for  choice  points. 

In  addition  to  the  special  stack  needed  for  recursive  unify,  we  also  had  to 
implement  our  own  procedure  call  mechanism  for  calling  the  common  functions. 
When  calling  a  common  function,  the  arguments  are  put  into  temporary  registers, 
the  return  address  is  put  into  a  temporary  register,  and  then  the  SPUR  jump 
instruction  is  used  to  execute  the  procedure.  No  registers  have  to  be  saved. 

We  implemented  two  of  the  commonly  used  large  macros  as  function  calls  in 
order  to  reduce  the  code  size  at  the  expense  of  four  extra  instructions,  two  to  call 
and  two  to  return.  Since  these  macros  were  complex,  the  extra  overhead  is  small 
compared  to  the  number  of  instructions  executed  in  the  function.  Table  3  shows 
the  improvements  in  code  size  because  of  this  optimization  on  the  Prolog 


Static  Code  Size  of  Macro-expanded  Benchmarks 

Benchmark 

No  functions 

F  unctions 

Cl  /  C2 

Functions 

C2  /  C3 

Cl  /  C3 

(Cl) 

for  two 

for  3  more 

largest 

large 

macros  (C2) 

macros  (C3) 

coni 

594 

414 

1.44 

385 

1.08 

1.54 

con6 

610 

430 

1.42 

401 

1.07 

1.52 

dividelO 

4922 

3988 

1.23 

1688 

2.36 

2.92 

hanoi 

585 

385 

1.52 

385 

1.00 

1.52 

loglO 

4606 

4040 

1.14 

1676 

2.41 

2.75 

mutest 

2945 

1703 

1.73 

1152 

1.48 

2.56 

nrevl 

2153 

761 

2.83 

669 

1.14 

3.22 

ops8 

4692 

3804 

1.23 

1632 

2.33 

2.88 

palin25 

3982 

2556 

1.56 

1632 

1.57 

2.44 

pri2 

2061 

1933 

1.07 

1704 

1.13 

1.21 

qs4 

3608 

1230 

2.93 

1038 

1.19 

3.48 

queens 

3826 

3636 

1.05 

2998 

1.21 

1.28 

query 

4136 

3942 

1.05 

3768 

1.10 

timeslO 

4398 

3988 

1.25 

1728 

2.31 

2.55 

Geom.  mean 

1.45 

1.44 

2.06 

Table  3.  This  table  shows  the  reduction  in  static  code  size  when  large  commonly 
used  macros  are  implemented  as  subroutine  calls  to  a  library  routine.  The  first 
column  shows  the  code  size  if  none  of  the  macros  are  turned  into  functions.  The 
second  column  shows  the  code  size  if  unify_constant  and  uni fy_value  are  turned 
into  functions  as  was  done  in  our  macro-expansion.  The  third  column  shows  the 
ratio  of  column  1  to  column  2.  Column  4  shows  the  code  size  if  getjist, 
get  structure  and  unify_variable  were  turned  into  functions.  Columns  5  and  6 
show  the  ratio  of  column  2  to  column  4  and  column  1  to  column  4  respectively. 
The  bottom  row  is  the  geometric  mean  over  all  of  the  benchmarks.  A  short 
description  of  the  benchmarks  is  given  in  Table  4. 
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benchmarks.  It  also  shows  what  would  happen  if  we  implemented  some  of  the 
smaller  commonly  used  macros  as  function  calls.  We  see  that  the  benchmarks 
with  two  macros  implemented  as  function  calls  have  a  code  size  44  percent 
greater  than  that  attainable  if  five  macros  were  function  calls.  Overall,  bench¬ 
marks  that  do  not  use  function  calls  for  the  macros  are  over  twice  as  large  as 
benchmarks  that  use  all  five. 

4.1.4.  Software  to  Apply  Macro-expansions  to  Benchmark  Programs 

Once  we  had  generated  the  macroexpansions  of  PLM  instructions  to  SPUR 
instructions  we  developed  a  software  system  to  automatically  apply  the  macro 
expansions  to  the  compiled  PLM  benchmark  programs.  We  developed  preproc 
and  postproc,  and  used  /lib/cpp  (a  standard  Unix  utility)  as  well  as  sas  and 
sld  (written  by  the  SPUR  Lisp  group)  to  generate  the  macroexpansions.  The 
sequence  that  transforms  a  PLM  program  into  SPUR  assembly  code  is: 

preproc 

It  takes  the  PLM  instructions  and  puts  them  in  a  format  that  can  be  used  by 
/lib/cpp  to  perform  the  macroexpansion.  It  also  extracts  all  constants  from 
the  PLM  code  and  puts  them  into  a  constant  table. 

/lib/cpp  . 

This  program  is  the  standard  Unix  C  preprocessor.  Its  purpose  is  to  macro 

expand  the  properly  formatted  PLM  instructions  into  SPUR  assembly 
language. 

postproc 

By  the  time  this  program  sees  the  PLM  program  it  has  already  been  macro 
expanded  to  SPUR  code.  However,  most  of  the  macros  have  labels  within 
them.  Since  a  macro  can  be  used  in  many  different  places  in  the  code  and 
labels  must  be  global,  there  would  be  many  label  conflicts  if  the  code  was 
passed  to  the  SPUR  assembler.  It  is  the  purpose  of  postproc  to  change  all 
of  the  labels  to  global  labels. 

sas  and  sld 

These  are  the  SPUR  assembler  and  loader.  They  take  SPUR  assembly  code 
and  turn  it  into  object  code  that  runs  on  the  SPUR  simulator. 

A  script  to  run  this  sequence  of  commands  to  produce  a  file  that  runs  on  the 
SPUR  simulator  is  shown  in  Appendix  1. 
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5.  Comparison  of  Prolog  Performance  on  PLM  and  SPUR 

Our  goal  of  running  the  benchmark  programs  on  the  SPUR  and  PLM  simula¬ 
tors  and  comparing  their  performance  was  accomplished  in  three  steps.  First  we 
wrote  macro-expansion  and  software  development  tools  and  applied  these  to  the 
benchmarks  listed  in  Table  4.  We  automatically  generated  SPUR  instructions 
from  their  PLM  instructions  for  all  but  one  (ckt2)  of  these  benchmarks  programs. 
Next  we  ran  the  macro-expanded  programs  on  the  SPUR  simulator  to  determine 
if  the  expansions  were  correct  and  to  generate  memory  references  traces.  To  ver¬ 
ify  the  correctness  of  the  macro-expansions,  we  modified  the  SPUR  simulator  to 
print  out  the  data  structures  generated  by  the  Prolog  program.  Lastly,  we 
modified  the  PLM  simulator  to  generate  memory  traces. 

5.1.  Modifications  to  the  PLM  and  SPUR  Simulators 

5.1.1.  The  PLM  Simulator 

Two  PLM  simulators  were  graciously  provided  by  Tep  Dobry  of  the 
Aquarius-PLM  project.  The  level  1  simulator  simulates  the  macro-architecture  of 
PLM  whereas  the  level  2  simulator  simulates  the  micro-architecture.  We  chose  to 


l—  Prolog  Benchmarks 

Name 

Description 

Lines 
of  PLM 

coni 

Deterministic  concatenation  of  two  lists. 

29 

con6 

Non-deterministic  concatenation  of  two  lists. 

33 

ckt2 

Design  of  a  2  by  1  MUX  using  NAND  gates. 

dividelO 

Symbolic  differentiation  using  division. 

hanoi 

Solution  to  Tower  of  Hanoi  problem  with  8  disks. 

55 

loglO 

Symbolic  differentiation  using  logarithms. 

216 

mutest 

Proof  of  theorem  in  Hofstadter’s  mu  math  system. 

142 

nrevl 

Naive  reversal  of  a  list  of  30  numbers. 

73 

ops8 

Symbolic  differentiation  using  a  polynomial. 

214 

palin25 

Program  to  generate  a  palindrome. 

187 

pri2 

Program  to  find  prime  numbers  less  than  100. 

141 

qs4 

A  quicksort  program  of  50  numbers. 

125 

queens 

Solution  to  the  queens  problem  on  a  4  by  4  chess  board. 

267 

query 

A  data  base  query  problem. 

340 

times  10 

Symbolic  differentiation  using  multiplication. 

222 

Table  4.  The  15  benchmarks  used  in  the  PLM  performance  study  [Dobry85]. 
All  have  been  implemented  on  SPUR  except  for  ckt2. 
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instrument  the  level  1  simulator  because  it  is  much  easier  to  understand  and 
modify  than  the  level  2  version.  It  is  important  to  note  that  the  level  1  simulator 
was  designed  to  run  the  benchmarks  and  is  not  capable  of  executing  all  Prolog 
programs.  Many  system  support  and  escape  instructions  are  not  implemented. 
This  is  the  reason  that  larger  benchmark  programs  were  not  run. 

The  simulator,  as  provided,  kept  frequency  counts  of  instructions  and  statis¬ 
tics  on  the  number  of  reads  and  writes,  dereferences,  unifications  and  bindings. 
We  enhanced  the  simulator  by  adding  code  to  output  memory  reference  traces 
and  to  compute  the  number  of  cycles  executed.  To  generate  data  for  our  cache 
studies,  we  modified  the  simulator  to  log  memory  reference  traces.  Very  few 
changes  were  required  to  record  data  references  because  data  references  go 
through  the  two  routines  stick  (data  write)  and  stuck  (data  read).  However,  more 
detective  work  was  needed  to  make  sure  that  all  references  to  the  code  space 
(“Cspace”)  were  recorded  since  constants  as  well  as  instructions  are  stored  in 
Cspace.  In  addition,  we  modified  the  simulator  to  record  the  real  size  of  instruc¬ 
tions  so  instruction  references  also  record  the  size  of  the  instruction  being  fetched. 

The  level  1  simulator,  unlike  the  level  2  simulator,  does  not  keep  track  of  the 
number  of  cycles  executed.  We  added  a  table  containing  the  average  number  of 
cycles  executed  for  each  instruction.  The  values  in  the  table  were  derived  using 
the  same  calculation  style  as  the  Berkeley  PLM  group.  Where  decisions  had  to  be 
made,  we  attempted  to  calculate  the  worst  case  path  with  the  exception  of  general 
unify,  decdr,  and  dereference  operations  since  the  PLM  chose  average  times  for 
these  operations  in  their  calculations.  Hence,  data  structures  with  multiple 
dereferences  take  longer  than  the  table  suggests.  We  compute  the  total  number  of 
cycles  executed  by  a  program  from  the  instruction  frequency  and  the  cycle  tables. 
The  total  cycles  executed  is  not  a  precise  value  but  a  lower  bound  of  the  real 
value. 

A  ‘hook’  routine  was  added  to  barb  (the  SPUR  simulator)  to  handle  escape 
calls.  Escapes  are  functions  that  cannot  be  handled  by  the  PLM  and  must  be  han¬ 
dled  by  the  host.  In  our  implementation  arithmetic  comparison  escapes  are  han¬ 
dled  in-line,  but  escapes  for  I/O  and  arithmetic  are  handled  in  barb.  Many 
escapes  are  analogous  to  system  calls  so  it  is  fair  not  to  expect  either  the  SPUR  or 
PLM  implementations  to  handle  them  in-line. 

5.2.  Results 

We  compared  the  static  and  dynamic  code  sizes,  number  of  instructions  exe¬ 
cuted,  of  the  SPUR  and  PLM  versions  of  the  benchmarks  in  tables  5  and  6.  The 
SPUR  versions  of  the  benchmarks  are,  as  expected,  uniformly  larger  than  their 
PLM  counterparts.  Table  5  shows  static  sizes  of  the  benchmarks  in  instructions 
and  bytes.  The  instruction  ratios  range  from  7.40  (hanoi)  to  19.52  (loglO).  The 
low  ratio  for  hanoi  is  because  one-half  of  its  PLM  instructions  map  to  sequences  of 
1  or  2  SPUR  instructions,  fully  one-third  of  the  PLM  instructions  executed  map  to 
sequences  of  just  1  SPUR  instruction.  The  high  ratio  for  loglO  is  because  40 /o  of 
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Static  Code  Size 

PLM 

SPUR 

S/P 

Benchmark 

#Instr 

#Bytes 

#Instr. 

#Bytes 

Instr. 

Bytes 

With 

ratio 

ratio 

func 

bytes 

coni 

28 

87 

414 

1656 

14.79 

19.03 

51.08 

con6 

32 

106 

430 

1720 

13.44 

16.23 

42.53 

dividelO 

213 

661 

3988 

15952 

18.72 

24.13 

28.36 

hanoi 

52 

183 

385 

1540 

7.40 

8.42 

23.65 

loglO 

207 

625 

4040 

16160 

19.52 

25.86 

30.32 

mutest 

141 

468 

1703 

6812 

12.08 

14.56 

20.51 

nrevl 

71 

260 

761 

3044 

10.72 

11.71 

22.43 

ops8 

205 

633 

3804 

15216 

18.56 

24.04 

28.44 

palin25 

178 

565 

2556 

10224 

14.36 

18.10 

23.03 

pri2 

132 

383 

1933 

7732 

14.64 

20.19 

27.47 

qs4 

121 

456 

1230 

4920 

10.17 

10.79 

16.90 

queens 

242 

723 

3636 

14544 

15.03 

20.12 

23.97 

query 

273 

1138 

3942 

15768 

14.44 

13.86 

16.31 

timeslO 

213 

661 

3988 

15952 

18.72 

24.13 

28.35 

Geom.  mean 

14.00 

17.06 

26.11 

Table  5.  Static  Code  Size  for  the  PLM  and  SPUR  Benchmarks.  The  average 
number  of  bytes  per  PLM  instruction  is  3.30.  There  is  an  additional  697  instruc¬ 
tions  (1788  bytes)  of  functions  loaded  with  each  SPUR  program. 


the  PLM  instructions  are  either  get.  structure  (approximately  43  SPUR  instruc¬ 
tions)  or  unify,  variable  (approximately  29  SPUR  instructions).  The  mean 
SPUR /PLM  ratio  for  instructions  and  bytes  are  14.00  and  17.06  respectively.  The 
byte  ratio  is  larger  because  the  average  PLM  instruction  is  3.30  bytes.  Note  that 
these  byte  ratios  do  not  include  the  code  for  a  fixed-size  subroutine  library  loaded 
with  each  SPUR  benchmark.  It  is  interesting  compare  the  size  of  this  subroutine 
library  (1.8KB)  with  the  size  of  the  PLM  microcode  (about  17KB). 

Comparison  of  the  dynamic  code  size  shows  that  SPUR  executes  on  average 
about  16  instructions  for  each  PLM  instruction  (see  Table  6).  The  hanoi  bench¬ 
mark  had  the  lowest  ratio  (11.81).  The  query  benchmark  has  the  highest  ratio 
(20.77)  for  which  we  do  not  see  a  ready  explanation.  These  ratios  do  not  reflect 
the  real  amount  of  work  done  by  the  PLM  since  PLM  instructions  take  from  one 
to  over  26  cycles  to  execute  while  SPUR  instructions  only  take  one  cycle  to  exe¬ 
cute.  Comparing  the  number  of  SPUR  instructions  executed  to  the  number  of 
PLM  cycles  executed  shows  that  on  average,  SPUR,  requires  2.31  cycles  for  each 
PLM  cycle.  The  SPUR/PLM  cycle  ratio  ranges  from  1.96  for  hanoi  and  pri2  to 
4.09  for  query.  Excluding  query,  the  highest  ratio  is  2.67  for  con6. 
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Instructions  and  Cycles  Executed 


Benchmark 


PLM 


SPUR 


SPUR/PLM 


SPUR  NOPs 


Barb 

Hooks 


Instr.  Cycles 


Instr. 


Cyles 


Instr.  Cycles 


% 


# 


coni 

con6 

divide  10 

hanoi 

loglO 

mutest 

nrevl 

ops8 

palin‘25 

pri2 

qs4 

queens 
query 
times  10 


43 

133 

447 

11996 

145 

12983 

4092 

260 

3540 

14567 

6596 

5177 

21973 

375 


296 

1006 

3512 

79097 

1182 

116967 

31825 

1935 

27092 

109384 

49308 

36350 

116845 

2891 


627 

2474 

6724 

141644 

2859 

249960 

62216 

3583 

64620 

195985 

96456 

76575 

456396 

5410 


672 

2685 

7374 

154911 

3139 

263243 

65846 

3918 

68838 

213929 

103648 

83135 

478223 

5916 


14.58 

18.60 

15.04 

11.81 

19.72 

19.25 
15.20 

13.78 

18.25 
13.45 
14.62 

14.79 
20.77 
14.43 


2.27 

2.67 

2.10 

1.96 

2.57 

2.25 

2.07 

2.02 

2.54 

1.96 

2.10 

2.29 

4.09 

2.05 


78 

294 

785 

17095 

330 

36021 

7993 

438 

9326 

22589 

11792 

9509 

55877 

650 


12.44 

11.88 

11.67 

12.07 

11.54 

14.41 
12.85 
12.22 
14.43 
11.53 

12.23 

12.42 

12.24 
12.01 


2 

30 

2 

765 

2 

7 

2 

4 

11 

579 

O 

it 

236 

1910 

2 


Geom.  mean 


15.81 


2.31 


12.39 


Table  6.  Dynamic  Code  Size  for  PLM  and  SPUR.  For  SPUR,  the  number  of  cy¬ 
cles  executed  is  calculated  by  adding  the  number  of  instructions  executed  to  the 
number  of  data  writes  from  Table  7,  effectively  double  counting  store  instruc¬ 
tions.  This  is  necessary  because  the  store  instruction  take  two  cycles  to  execute. 
Using  the  data  from  this  table  and  from  Table  7,  the  average  number  of  cycles 
per  SPUR  instruction  is  1.07. 


The  percentage  of  no-op  instructions  in  the  SPUR  code  averages  12.39% 
(Table  6).  Most  of  the  no-op  slots  in  the  branch,  call  and  return  instructions  were 
not  used,  but  many  were  used  after  jumps.  The  barb  hook  column  in  Table  6 
measures  the  number  of  calls  to  barb  to  handle  I/O  and  arithmetic  operations. 
Table  7  shows  the  number  of  data  reads  aud  writes  for  the  benchmarks.  Gen¬ 
erally,  SPUR  does  3%  more  reads  and  18%  more  writes  than  PLM.  The  ratio  of 
SPUR/PLM  reads  ranges  from  0.72  (loglO)  to  1.43  (coni)  and  the  ratio  for  writes 
ranges  from  0.88  (qs4)  to  2.37  (coni).  The  coni  benchmark  has  anomalous 
behavior  because  it  performs  almost  twice  as  many  reads  and  writes  for  SPUR 
than  for  PLM. 

5.3,  Analysis  of  Memory  Traces 

Up  to  now,  we  have  compared  performance  for  the  two  machines  assuming 
that  all  memory  references  are  completed  in  one  cycle.  This  is  the  type  of  perfor¬ 
mance  measurement  used  in  [Dobry85].  A  more  realistic  model  of  performance 
would  consider  the  memory  system  used  by  the  architecture.  The  memory  refer¬ 
ence  traces  enable  us  to  do  detailed  simulations  of  cache  performance  and  com¬ 
pare  the  effect  of  SPUR’s  increased  code  size  on  the  miss  ratio.  The  memory 
trace  data  was  analyzed  using  the  dineroHI  cache  simulator  [Hill83,  Hill85]. 
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Number  of  Data  Reierences 

Benchmark 

Reads 

Writes 

PLM 

SPUR 

mm 

PLM 

SPUR 

S/P 

coni 

21 

30 

1.43 

19 

45 

2.37 

con6 

•  225 

268 

1.19 

161 

211 

1.31 

divide  10 

729 

578 

0.79 

598 

650 

1.09 

hanoi 

14792 

14798 

1.00 

17100 

13267 

0.78 

loglO 

334 

240 

122 

180 

1.48 

mutest 

16131 

16590 

14502 

13283 

0.92 

nrevl 

1637 

2728 

1.67 

1697 

3630 

2.14 

ops8 

349 

335 

314 

335 

1.07 

palin25 

3955 

5120 

1.29 

3351 

4218 

1.26 

pri2 

19734 

19383 

23881 

17944 

0.75 

qs4 

8196 

6890 

8196 

7192 

0.88 

queens 

7475 

7507 

7076 

6560 

0.93 

query 

41813 

48264 

1.15 

14522 

21827 

1.50 

timeslO 

603 

488 

0.81 

499 

506 

1.01 

Geom.  mean 

1.03 

1.18 

Table  7.  Number  of  memory  data  references  for  PLM  and  SPUR.  All  memory 
references  are  assumed  to  complete  within  one  cycle. 


Instruction  buffers  were  not  simulated  in  these  studies.  We  felt  that  since 
they  perform  very  different  functions  for  the  two  architectures  it  would  be 
difficult  to  compare  the  miss  ratio  results.  It  is  also  not  clear  which  architecture 
would  benefit  more  if  its  instruction  buffer  were  included.  In  the  case  of  the  PLM 
the  instruction  buffer  does  not  reduce  memory  bandwidth  since  its  function  is  pri¬ 
marily  as  a  decoder.  Four  instructions  is  clearly  not  enough  to  capture  loops  that 
may  exist  in  the  PLM  code.  In  the  case  of  SPUR  the  instructions  buffer  is  a  sim¬ 
ple  instruction  cache  that  helps  to  reduce  instruction  and  data  fetch  contention 
for  the  mixed  cache.  At  128  words  it  is  large  enough  to  hold  many  loops  and 
recursive  procedures  in  the  SPUR  macro-expansion. 

Simulations  were  done  for  two  types  of  caches:  a  mixed  instruction  and  data 
cache  (as  in  SPUR)  and  a  separate  instruction  and  data  cache  (as  in  PLM).  The 
caches  were  direct-mapped  and  varied  in  size  from  2KB  to  128KB  and  infinity.  A 
block  size  of  32  bytes  was  used  in  all  the  simulations. 

Tables  8  and  9  show  the  result  of  the  simulations  done  with  dineroIII  for  the 
PLM.  Generally,  separate  I&D  caches  gave  better  miss  ratios  when  the  cache  size 
was  less  than  8KB.  The  mixed  and  separate  miss  ratios  are  the  same  after  8KB 
except  for  nrevl.  This  is  probably  a  reflection  of  the  small  size  of  the  benchmark 
programs.  Data  and  instruction  addresses  for  the  PLM  were  offset  by  2048  to 
minimize  collisions  between  cache  blocks  containing  data  and  instructions.  In 
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Separate  I&D  Cache  Miss  Ratios  for  PLM 


Benchmark 


2KB 


4KB 


8KB 


16KB  32KB  64KB  128KB 


coni 

con6 

divide  10 

hanoi 

loglO 

mutest 

nrevl 

ops8 

palin25 

pri2 

qs4 

queens 
query 
times  10 


7.23% 

11.56 

5.07 

0.04 

3.83 

0.70 

1.94 

4.66 

3.61 

3.42 
3.08 
2.01 

5.42 
3.59 


7.23% 

11.56 

3.33 

0.04 

3.83 

0.70 

1.76 

4.66 

1.53 

0.78 

1.41 

2.01 

3.15 

3.25 


7.23% 

11.56 

3.33 

0.04 

3.83 

0.70 

1.63 

4.66 

1.53 

0.78 

1.01 

2.01 

3.15 

3.25 


7.23% 

2.12 

3.16 

0.04 

3.66 

0.36 

1.63 

3.90 

0.96 

0.46 

0.64 

1.95 

0.08 

3.05 


7.23% 

2.12 

3.16 

0.04 

3.66 

0.36 

1.63 

3.90 

0.96 

0.46 

0.64 

1.95 

0.08 

3.05 


7.23% 

2.12 

3.16 

0.04 

3.66 

0.36 

1.63 

3.90 

0.96 

0.46 

0.64 

1.95 

0.08 

3.05 


7.23% 

2.12 

3.16 

0.04 

3.66 

0.11 

1.41 

3.90 

0.92 

0.44 

0.64 

0.29 

0.08 

3.05 


Table  8.  Cache  miss  ratios  for  PLM  using  separate  instruction  and  data  caches 
of  varying  sizes.  The  sizes  listed  are  the  total  size  for  instruction  and  data  caches 
of  equal  size.  Cache  parameters:  direct- mapped,  separate  I+D,  32-byte  blocks. 


Mixed  I&D  Cache  Miss  Ratios  for  PLM 


Benchmark 


2KB 


4KB 


8KB 


16KB  32KB  64KB  128KB 


coni 


'.23%  7.23%  7.23%  7.23%  7.23%  7.23%  7.23% 


con6 

18.69 

11.56 

2.12 

divide  10 

5.75 

3.33 

3.16 

hanoi 

8.92 

0.04 

0.04 

loglO 

8.15 

3.83 

3.66 

mutest 

6.90 

0.70 

0.36 

nrevl 

3.41 

2.83 

1.63 

ops8 

9.21 

4.66 

3.90 

palin25 

2.30 

1.53 

0.96 

pri2 

6.61 

1.27 

0.56 

qs4 

1.98 

1.28 

0.64 

queens 

9.60 

2.01 

1.95 

query 

13.70 

3.15 

0.08 

times  10 

5.89 

3.25 

3.05 

2.12 

2.12 

2.12 

2.12 

3.16 

3.16 

3.16 

3.16 

0.04 

0.04 

0.04 

0.04 

3.66 

3.66 

3.66 

3.66 

0.36 

0.36 

0.11 

0.11 

1.63 

1.63 

1.41 

1.41 

3.90 

3.90 

3.90 

3.90 

0.96 

0.96 

0.92 

0.92 

0.46 

0.46 

0.44 

0.44 

0.64 

0.64 

0.64 

0.64 

1.95 

1.95 

0.29 

0.29 

0.08 

0.08 

0.08 

0.08 

3.05 

3.05 

3.05 

3.05 

Table  9.  Cache  Miss  Ratios  for  PLM  using  a  mixed  instruction  and  data  cache. 
Various  cache  sizes  were  simulated  with  dineroIII.  Cache  parameters:  direct- 
mapped,  mixed  I+D,  32-byte  blocks. 


fact,  even  the  numbers  for  a  cache  size  of  4KB  (the  minimum  to  avoid  the  extra 
collisions)  were  close  enough  that  a  comparison  of  PLM  and  SPUR  with  a  mixed 
cache  is  justified.  We  do  not  want  to  include  the  differences  in  memory  system  in 
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the  comparison  of  the  two  architectures. 

The  SPUR  trace  data  was  only  simulated  with  a  mixed  I&D  cache  since  the 
SPUR  hardware  will  be  equipped  with  a  mixed  I&D  cache.  Unlike  PLM,  SPUR 
has  a  single  address  space  per  process  with  separate  segments  for  code  and  data, 
hence  no  offset  was  used  between  SPUR  instruction  and  data  addresses. 

Small  benchmarks  such  as  coni,  loglO,  and  ops8  had  comparatively  high  miss 
ratios  even  with  an  infinite  cache  size  (see  Table  10).  The  miss  ratio  of  an  infinite 
cache  is  defined  as  the  number  of  unique  blocks  referenced  divided  by  the  total 
number  of  references.  Table  11  compares  the  data  in  Tables  9  and  10  of  the  8 
largest  benchmarks  and  it  shows  that  SPUR  requires  a  cache  4  to  8  times  larger 
than  PLM  to  get  approximately  equivalent  miss  ratios.  It  is  interesting  to  note 
that  this  ratio  is  very  close  to  the  actual  ratio  of  cache  sizes  in  the  two  implemen¬ 
tations.  The  cache  sizes  were  chosen  such  that  miss  ratios  were  under  1%  and  the 
SPUR  and  PLM  ratios  also  were  approximately  the  same.  The  nrevl  benchmark 
is  interesting  because  SPUR  had  a  better  miss  ratio  than  PLM  because  under 
PLM,  nrevl  references  are  55%  code  and  45%  data  while  under  SPUR  references 
are  91%  code  and  9%  data.  The  benchmark  is  small  and  the  code  miss  ratio  is 
about  0.2%  for  PLM  and  SPUR  for  a  cache  size  of  16KB  and  greater.  The  data 
miss  ratio  is  3.4%  for  PLM  and  7.8%  for  SPUR.  Since  the  PLM  version  is  half 
data,  the  data  miss  ratio  dominates  the  overall  ratio.  Under  SPUR,  the  data  miss 


Mixed  I&D  Cache  Miss  Ratios  for  SPUR 

Benchmark 

2KB 

4KB 

8KB 

16KB 

32KB 

64KB 

128KB 

coni 

14.65% 

12.23% 

10.67% 

10.53% 

9.53% 

9.53% 

9.53% 

con6 

9.04 

5.52 

4.43 

3.79 

2.91 

2.91 

2.91 

divide  10 

15.76 

10.45 

6.87 

3.87 

3.56 

3.56 

3.56 

hanoi 

6.46 

2.47 

0.35 

0.12 

0.09 

0.09 

0.09 

loglO 

14.42 

10.24 

8.37 

8.31 

5.58 

5.58 

5.58 

mutest 

11.1 

4.38 

1.18 

0.43 

0.12 

0.12 

0.11 

nrevl 

7.38 

2.38 

1.08 

0.96 

0.92 

0.92 

0.76 

ops8 

16.69 

13.96 

11.92 

7.97 

7.03 

6.72 

6.72 

palin25 

13.11 

7.08 

4.09 

2.94 

1.31 

0.88 

0.69 

pri2 

13.40 

7.39 

2.12 

0.99 

0.70 

0.68 

0.35+ 

qs4 

12.70 

5.11 

2.55 

1.77 

1.11 

1.11 

0.45 

queens 

15.34 

9.99 

6.62 

2.70 

1.17 

0.54 

0.53+ 

query 

12.22 

8.57 

2.88 

0.87 

0.47 

0.11 

0.11 

timeslO 

14.30 

11.79 

7.99 

4.17 

3.76 

3.76 

3.78 

Table  10. 

Cache  miss  ratios 

for  SPUR 

using  a  mixed  instruction  and  data 

caches  of  various  sizes.  Except  for  pri 2  and  queens,  the  128KB  cache  was  equal  to 
an  infinite  cache.  The  infinite  cache  miss  ratios  for  pri2  and  queens  are  0.34%  and 
0.48%  respectively.  Cache  parameters:  direct-mapped,  mixed  I+D,  32-byte 
blocks. 
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Comparison  of  Mixed  I&D  Cache  Miss  Ratios 

PLM 

SPUR 

S/P 

Benchmark 

Size  (KB)  Miss 

Size  (KB)  Miss 

Ratio 

Ratio 

(%) 

{%) 

hanoi 

4  0.04 

32  0.09 

8 

mutest 

4  0.70 

16  0.43 

4 

nrevl 

64  1.41 

16  0.96 

.25 

palin  2  5 

8  0.96 

64  0.88 

8 

pri2 

8  0.56 

64  0.68 

8 

qs4 

8  0.64 

64  1.11 

8 

queens 

64  0.29 

64  0.54 

1 

query 

8  0.08 

64  0.11 

8 

Table  11.  Comparison  of  SPUR  and  PLM  cache  miss  ratios  using  a  mixed  I&D 
cache.  The  first  PLM  miss  ratio  under  1%  was  chosen  for  each  benchmark.  The 
corresponding  SPUR  cache  size  for  an  equivalent  miss  ratio  was  then  used  to 
determine  the  SPUR/PLM  ratio. 

ratio  contributes  only  9*?b  to  the  overall  miss  ratio.  Examining  the  trace  files 
shows  that  the  high  data  miss  ratio  is  probably  due  to  the  large  number  of 
environment  allocations  on  the  stack. 

We  also  simulated  fully-associative  caches  for  PLM  and  SPUR  trace  data  to 
see  the  effects  of  conflicts  (see  tables  12  and  13).  In  every  benchmark  except  for 
queens,  the  miss  ratio  for  direct-mapped  and  fully-associative  caches  were  equal 
after  an  8KB  cache  size  for  PLM  and  32KB  for  SPUR.  Another  interesting  obser¬ 
vation  is  that  the  miss  ratio  for  fully-associative  caches  of  sizes  2KB  for  PLM  and 
8- 16KB  for  SPUR  equaled  the  miss  ratio  of  infinitely-large  caches. 
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Fully-Associative  Mixed  I&D  Cache  Miss  Ratios  for  PLM 


Benchmark 

2KB 

4KB 

8KB 

16KB 

32KB 

64KB 

coni 

7.23% 

7.23% 

7.23% 

7.23% 

7.23% 

7.23% 

con6 

2.12 

2.12 

2.12 

2.12 

2.12 

2.12 

3.16 

3.16 

3.16 

3.16 

3.16 

3.16 

hanoi 

0.04 

0.04 

0.04 

0.04 

0.04 

0.04 

loglO 

3.66 

3.66 

3.66 

3.66 

3.66 

3.66 

mutest 

0.11 

0.11 

0.11 

0.11 

0.11 

0.11 

nrevl 

1.52 

1.41 

1.41 

1.41 

1.41 

1.41 

ops8 

3.90 

3.90 

3.90 

3.90 

3.90 

3.90 

palin25 

1.06 

0.92 

0.92 

0.92 

0.92 

0.92 

pri2 

0.47 

0.45 

0.44 

0.44 

0.44 

0.44 

qs4 

0.72 

0.64 

0.64 

0.64 

0.64 

0.64 

queens 

0.29 

0.29 

0.29 

0.29 

0.29 

0.29 

query 

0.08 

0.08 

0.08 

0.08 

0.08 

0.08 

3.05 

3.05 

3.05 

3.05 

3.05 

3.05 

Table  12.  Cache  miss  ratios  for  PLM  using  a  fully- associative  mixed  instruction 
and  data  cache.  Various  cache  sizes  were  simulated  with  dineroIII.  Cache  block 
size:  32  bytes. 


Fully-Associative  Mixed  I&D  Cache  Miss  Ratios  for  SPUR 


Benchmark 

2KB 

4KB 

8KB 

16KB 

32KB 

64KB 

coni 

9.53% 

9.53% 

9.53% 

9.53% 

9.53% 

9.53% 

con6 

11.31 

2.91 

2.91 

2.91 

2.91 

2.91 

dividelO 

12.62 

3.99 

3.57 

3.56 

3.56 

3.56 

hanoi 

4.44 

0.09 

0.09 

0.09 

0.09 

0.09 

loglO 

5.91 

5.61 

5.58 

5.58 

5.58 

5.58 

mutest 

11.55 

2.08 

0.11 

0.11 

0.11 

0.11 

nrevl 

1.19 

0.81 

0.79 

0.76 

0.76 

0.76 

ops8 

14.34 

12.15 

6.72 

6.72 

6.72 

6.72 

palin25 

12.35 

3.19 

0.84 

0.69 

0.69 

0.69 

pri2 

13.18 

0.84 

0.35 

0.34 

0.34 

0.34 

qs4 

9.94 

1.41 

0.46 

0.45 

0.45 

0.45 

queens 

12.21 

9.31 

3.81 

0.48 

0.48 

0.48 

query 

11.52 

2.66 

1.89 

0.10 

0.10 

0.10 

timeslO 

12.47 

4.12 

3.76 

3.76 

3.76 

3.76 

Table  13.  Cache  miss  ratios  for  SPUR  using  a  fully-associative  mixed  instruc¬ 
tion  and  data  caches  of  various  sizes.  Cache  block  size:  32  bytes. 
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8.  Future  Work 

8.1.  Optimizations 

Three  directions  for  future  work  are  possible,  optimizing  our  macro¬ 
expansions,  building  a  compiler  and  augmenting  SPUR’s  hardware  to  support  Pro¬ 
log  more  directly. 

8.1.1.  Macro-expansion  Optimizations  and  Compiling 

As  stated  in  Section  4.1.3,  we  made  no  attempt  to  optimize  the  macro¬ 
expansions  beyond  using  SPUR’s  tags  and  register  windows.  Many  simple  optimi¬ 
zations  are  possible  that  will  greatly  improve  performance.  For  example,  the 
effect  of  using  one  no-op  slot  in  the  variable  dereferencing  macro  decreased  the 
number  of  cycles  executed  by  the  query  benchmark  by  2.5%.  Applying  peephole 
optimizations  and  macro-expansion  optimizations  should  yield  a  large  improve¬ 
ment. 

A  much  more  ambitious  project  is  to  compile  Prolog  into  native  SPUR  code 
rather  than  using  macro-expansion.  In  some  systems,  compilation  can  improve 
performance  by  factors  of  two  or  three.  It  would  be  interesting  to  see  what  a 
compiler  for  SPUR  could  gain  in  performance  and  code  density  over  our  macro¬ 
expansion  technique. 

6.1.2.  Improvements  to  SPUR 

One  shortcoming  of  SPUR  is  that  tags  can  only  be  compared  with  immedi- 
ates  [tag _cmp_br_  delayed).  The  PLM,  on  the  other  hand,  can  test  a  subset  of  the 
tag  bits  for  a  pattern.  In  the  macro-expansion,  tags  must  be  read  into  a  register, 
anded  with  a  mask,  and  then  a  compare-branch  instruction  used  on  the  result. 
The  sequence  of  instructions  required  to  perform  this  operation  are  rd_  tag ,  and , 
and  cmp _  br.  delayed  (denoted  R-A-C).  With  the  SPUR  simulator  we  measured  a 
15%  average  improvement  in  performance  if  SPUR  had  a  single  instruction  to 
replace  the  R-A-C  sequence  (Table  14).  We  calculated  this  by  counting  the 
number  of  and  instructions  executed  since  and  is  only  used  for  the  masking  opera¬ 
tion.  This  is  an  upper  bound  because  some  of  the  and  instructions  can  be  done  in 
no-op  slots  instead  of  in  the  R-A-C  sequence.  The  first  three  columns  of  Table  14 
show  that  87%  of  and  instructions  appear  in  the  R-A-C  sequence.  Assuming  that 
the  static  distribution  of  and  instructions  approximates  the  dynamic  distribution, 
these  results  indicate  that  an  improvement  of  more  than  10%  would  be  attainable 
with  the  additional  instruction.  This  instruction  can  be  added  to  the  SPUR  archi¬ 
tecture  without  affecting  the  cycle  time  and  at  only  a  modest  impact  in  extra  cir¬ 
cuitry.  A  possible  format  for  the  instruction  is  shown  in  Figure  2. 

6.2.  A  Prolog  Coprocessor  for  SPUR 

The  other  type  of  performance  improvement  we  considered  is  a  specialized 
hardware  accelerator  for  SPUR.  SPUR  supports  a  tightly-coupled  coprocessor 
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Potential  Performance  Improvements 

Benchmark 

Static 

Ands 

Ratio 

Ands 

Executed 

Cycles 

Saved  {%) 

coni 

57 

63 

0.90 

54 

17.23 

con6 

56 

62 

0.90 

182 

14.71 

dividelO 

202 

242 

0.80 

476 

14.16 

hanoi 

48 

53 

0.91 

6378 

9.00 

loglO 

206 

247 

0.83 

208 

14.55 

mutest 

112 

129 

0.87 

21320 

17.06 

nrevl 

68 

76 

0.89 

6440 

20.70 

ops8 

194 

232 

0.84 

254 

14.18 

palin25 

154 

181 

0.85 

5426 

16.79 

pri2 

131 

150 

0.87 

11426 

11.66 

qs4 

92 

104 

0.88 

8293 

17.20 

queens 

213 

249 

0.86 

5551 

14.50 

query 

299 

319 

0.94 

43757 

19.18 

times  10 

202 

242 

0.83 

386 

14.27 

Geom.  mean 

0.87 

15.08 

Table  14.  This  table  illustrates  the  possible  performance  improvement  if  an  in¬ 
struction  that  applies  a  mask  to  a  tag  and  branches  on  the  result  were  added  to 
SPUR.  The  new  instruction  would  replace  a  sequence  of  three  current  instruc¬ 
tions.  We  computed  the  improvement  by  dividing  twice  the  number  of  and  in¬ 
structions  executed  by  the  total  number  of  cycles  currently  executed  for  each 
benchmark  (Table  6),  assuming  all  and  instructions  appear  in  the  R-A-C  se¬ 
quence. 


opcode 

eq 

src  reg 

mask 

value 

offset 

neq 

7  bits 

1  bit 

5  bits 

5  bits 

5  bits 

9  bits 

Figure  2.  This  figure  shows  a  possible  format  for  a  combined  read-and-compare 
tag  instruction  for  SPUR.  The  operands  are  a  flag  indicating  test  for  equal/not 
equal,  the  source  register  for  the  tag,  a  constant  to  be  masked  with  the  tag,  a  con¬ 
stant  value  to  be  tested  with  the  tag  and  an  offset  to  add  to  the  program  counter 
if  the  test  is  successful. 

model  where  the  coprocessor  and  CPU  are  on  the  same  side  of  the  system  bus  and 
make  use  of  the  same  caches  [Hansen86j.  Instruction  fetching  is  under  strict  con¬ 
trol  of  the  CPU.  However,  execution  of  some  instructions  is  deferred  to  the 
coprocessor.  The  CPU  initiates  loads  and  stores  into  and  out  of  the  coprocessor 
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but  the  coprocessor  latches  the  data  or  supplies  it  directly  to  the  cache.  This  is 
ideal  for  floating  point  units  that  otherwise  would  have  too  high  an  overhead  if 
bus  accesses  were  required  for  each  operation.  In  contrast  to  SPUR,  the  PLM 
employs  the  loosely-coupled  model.  The  coprocessor  is  not  on  the  same  board  as 
the  CPU  and  the  set  up  of  the  computation  and  reading  of  the  results  must  be 
done  over  the  system  bus. 

We  considered  designing  a  tightly-coupled  coprocessor  for  SPUR  implement¬ 
ing  a  subset  of  the  PLM  instructions.  We  identified  the  basic  operations  per¬ 
formed  by  the  PLM  and  then  determined  which  could  be  performed  efficiently 
with  standard  SPUR  instructions  and  which  would  benefit  from  the  use  of  special 
coprocessor  hardware.  We  suggest  extensions  to  SPUR’s  coprocessor  interface 
and  a  coprocessor  architecture  that  allows  SPUR  to  execute  Prolog  at  the  same 
spieed  as  the  PLM. 

6.2.1.  Issues  in  Coprocessor  Design 

The  current  SPUR  coprocessor  interface  is  quite  limited.  All  memory  opera¬ 
tions  to  or  from  coprocessor  registers  are  initiated  by  the  CPU;  the  coprocessor 
can  only  manipulate  the  contents  of  its  registers.  The  only  coprocessor  actively 
planned  by  the  SPUR  group  is  a  floating  point  unit,  which  has  heavily  influenced 
the  interface.  For  our  purposes,  the  current  coprocessor  interface  is  unworkable 
since  Prolog  does  not  perform  sophisticated  bit  manipulations  of  registers  as  do 
floating  point  coprocessors,  but  instead  reads,  compares,  and  updates  the  contents 
of  memory  locations.  We  will  describe  the  extensions  we  feel  are  necessary  to  add 
an  arbitrary  (possibly  microcoded)  coprocessor  to  SPUR. 

Our  Prolog  coprocessor  design  greatly  reduces  the  size  of  PLM  programs  on 
SPUR  because  it  executes  much  higher  level  instructions  than  does  SPUR.  This 
reduces  SPUR’s  static  and  dynamic  code  sizes,  increasing  cache  and  instruction 
buffer  performancef. 

For  this  coprocessing  model  to  be  compatible  with  the  RISC  nature  of  SPUR, 
two  simple  rules  must  be  enforced.  First,  all  instructions  must  be  restartable.  If  a 
coprocessor  instruction  causes  a  page  fault,  it  must  allow  the  CPU  to  perform  the 
necessary  operations  to  bring  that  page  into  memory.  The  CPU  then  reissues  the 
instruction  that  caused  the  trap  just  as  it  does  for  all  CPU  instructions.  There¬ 
fore,  coprocessor  instructions  cannot  change  the  internal  state  of  the  coprocessor 
until  all  memory  references  have  been  completed.  The  second  rule  is  that  a  sys¬ 
tem  interrupt  must  not  be  serviced  until  ail  coprocessor  instructions  in  progress 
have  been  allowed  to  complete  (unless  prevented  from  doing  so  by  a  page  fault). 
This  will  ensure  consistent  changes  to  the  internal  state  of  the  coprocessor. 


t  In  general,  if  a  collection  of  instructions  can  be  found  that  contribute  greatly  to  the 
execution  time  of  certain  applications,  they  can  be  built  into  a  (microcoded)  coprocessor. 
In  fact,  a  standard,  tailorable,  micro-engine  coprocessor  could  be  designed  and  tailored  to 
different  applications. 
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Interrupts  are  currently  handled  this  way  for  SPUR’s  floating-point  coprocessor. 

These  two  rules  at  first  seem  incompatible  with  the  indeterminate  and  recur¬ 
sive  PLM  instructions.  The  most  important  case  of  this  is  the  PLM’s  implementa¬ 
tion  of  the  unify  instruction.  It  performs  a  pattern  matching  of  two  arbitrarily 
large  data  structures  and  cannot  be  suspended  and  restarted.  Hence  the  PLM 
must  wait  for  page  traps  to  be  resolved  by  the  NCR  coprocessor  that  acts  as  its 
host.  In  contrast,  SPUR  must  service  its  own  page  faults.  In  the  next  section  we 
show  that  the  unify  operation  can  be  unwound  so  that  only  one  pattern  matching 
step  is  performed  per  coprocessor  instruction,  making  the  instruction  restartable. 
The  unwound  instructions  have  a  much  lower  bound  on  their  execution  time  and 
this  allows  the  second  rule  to  be  enforced  without  delaying  interrupt  servicing 
appreciably. 

6.2.2.  Extensions  to  SPUR’s  Coprocessor  Interface 

Besides  the  addition  of  the  cache  address  bus  and  cache  operation  lines  to  the 
coprocessor,  the  only  other  additions  are  a  page  fault  line  and  the  coprocessor 
memory  access  line.  When  a  coprocessor  instruction  needs  to  generate  a  data 
load  or  store  it  asserts  the  memory  access  line  one  cycle  in  advance  to  prevent 
SPUR’s  instruction  fetch  unit  from  attempting  a  cache  operation.  In  effect,  this  is 
a  cache  bus  arbitration  line  that  always  resolves  in  favor  of  the  coprocessor.  The 
needed  functionality  is  already  present  in  SPUR’s  prefetch  unit  in  the  circuitry 
used  to  resolve  instruction  and  data  access  collisions.  Support  for  this  facility 
consists  of  a  pin  dedicated  to  the  memory  access  line  and  an  internal  OR  gate  to 
make  this  line  appear  as  a  CPU  instruction  generating  a  data  load  or  store. 
Traps  and  exceptions  are  handled  the  same  way  as  before.  The  important  rules 
to  follow  are  (again):  instructions  must  be  restartable  and  interrupts  wait  for 
instructions  to  complete. 

6.2.3.  Prolog  Coprocessor  Architecture 

The  coprocessor  design  was  strongly  influenced  by  the  PLM.  We  looked  at 
the  detailed  operations  each  instruction  performed  and  determined  which  ones 
would  be  easily  and  efficiently  handled  by  SPUR  CPU  instructions,  which  ones 
require  complex  tag  and  pointer  manipulations  and  have  to  be  unwound  so  that 
they  can  be  restartable,  and  which  ones  could  be  implemented  directly. 

We  placed  all  Prolog  execution  state  (i.e.  the  registers  of  the  PLM)  in  the 
coprocessor.  These  are  required  by  most  instructions  and  placing  them  in  the 
coprocessor  enables  optimizations  for  choice  point  buffering.  Currently  we  do  not 
make  use  of  the  register  windows  although  these  could  easily  be  implemented 
directly  in  the  coprocessor.  Register  file  reading  and  writing  is  identical  to  the 
SPUR  CPU  including  the  pipeline  forwarding  logic.  Provisions  are  made  for  extra 
global  registers  and  a  special  NIL  register. 

All  system  support  functions  are  performed  as  they  would  be  for  any  other 
program  running  on  SPUR.  Special  system  or  library  calls  for  supporting  Prolog 
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can  be  incorporated  in  the  SPUR  operating  system.  These  would  be  a  subset  of 
the  escape  codes  used  by  the  PLM,  most  would  already  be  available.  Code  and 
data  segments  are  managed  in  the  same  way  as  the  SPUR-only  implementation 
described  above. 

6. 2. 3.1.  Coprocessor  Instruction  Set 

Coprocessor  instructions  can  be  divided  into  six  groups.  A  complete  list  of 
these  and  an  outline  of  their  microcode  is  provided  in  Appendices  3  and  4.  The 
first  group  includes  three  types  of  data  transfer  instructions.  These  move  data 
between  the  coprocessor  and  memory,  between  the  coprocessor  and  the  CPU,  and 
between  registers  in  the  coprocessor.  The  second  group  is  the  state  modifying  and 
saving  instructions.  These  are  used  to  push  and  pop  choice  points  and  environ¬ 
ments  from  their  respective  stacks  as  well  as  setting  the  mode  bits.  Compare  and 
branch  instructions  make  up  the  third  group.  These  include  a  read,  mask,  and 
compare  tag  instruction  such  as  the  one  suggested  for  SPUR  previously  in  this 
section  and  a  condition  code  test  instruction.  The  next  group  is  the  unwound 
unify  instructions  that  are  discussed  in  more  detail  below.  The  fifth  group  is  heap 
and  trail  manipulation  instructions.  These  are  used  to  allocate  variables  on  the 
top  of  the  heap  and  undo  the  bindings  on  the  trail  stack  at  goal  failure.  The  last 
group  consists  of  the  special  hash  instruction  used  by  the  PLM  to  implement  a 
multi-way  branch  based  on  the  value  of  an  argument.  This  is  currently  imple¬ 
mented  as  a  linear  search  of  a  table  as  it  is  in  the  PLM. 

Q.2.3.2.  Unwinding  the  Unify  Instruction 

To  unwind  the  unify  instruction  we  need  to  add  two  special  registers  to  the 
PLM  architecture  holding  the  addresses  of  the  next  two  items  to  unify.  When  a 
unify  instruction  cannot  complete  after  having  unified  the  original  arguments, 
rather  than  continuing  as  in  the  PLM,  it  places  the  intermediate  arguments  in 
these  registers  (Ul  and  U2).  Unify  instructions  that  have  the  possibility  of  becom¬ 
ing  recursive  (unification  with  constants  and  variables  cannot  possibly  be  recur¬ 
sive)  are  followed  by  a  compare  and  branch  instruction  that  tests  whether  there  is 
more  to  unify  or  not.  If  there  is,  the  branch  back  to  the  unify  instruction  is 
taken,  if  not  execution  proceeds  sequentially.  When  the  unify  instruction  begins 
execution  it  first  checks  the  “more  to  unify”  mode  bit  and,  if  it  is  set,  continues 
execution  using  the  contents  of  the  Ul  and  U2  registers.  If  pointers  need  to  be 
pushed  onto  the  push  down  list  (PDL)  this  is  done  at  the  beginning  of  the  unify 
instructions;  if  they  need  to  be  popped,  this  is  done  at  the  end.  If  there  is  nothing 
on  the  PDL  then  the  “more  to  unify”  bit  is  reset.  Therefore  the  unify  arguments 
are  always  available  at  the  beginning  of  instruction  execution. 

There  is  a  performance  penalty  for  a  two  instruction  loop  in  SPUR,  however. 
SPUR  does  not  support  delayed  slot  cancellation  when  a  branch  is  taken.  In  the 
case  of  the  unify  loop,  adding  cancellation  would  eliminate  many  of  the  no-ops 
required  in  the  code.  Without  it,  we  are  forced  to  use  a  three  instruction 
(dynamic)  loop  where  one  of  the  instructions  is  a  no-op.  However,  when  one 
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considers  the  time  spent  by  the  coprocessor  in  the  unify  operation  in  each  loop, 
this  extra  cost  is  not  large. 

6. 2. 3. 3.  Interfacing  to  the  SPUR  CPU  Pipeline 

The  SPUR  pipeline  consists  of  four  stages:  instruction  fetch,  register  read  and 
address  generation,  memory  access,  and  register  write.  The  registers  allow  a  sin¬ 
gle  write  and  two  reads  in  a  single  cycle.  Results  of  compare  instructions  must  be 
generated  by  the  end  of  the  second  stage  so  that  only  one  instruction  is  in  the 
pipeline  after  the  one  that  causes  the  branch.  This  is  what  is  meant  by  delayed 
branch.  The  CPU  pipeline  can  be  suspended  by  the  coprocessor  by  asserting  the 
busy  line.  This  facility  allows  the  coprocessor  to  arbitrarily  extend  any  cycle  for 
its  own  purposes.  Our  coprocessor  will  not  be  used  in  parallel  mode,  to  avoid 
cache  access  conflicts.  Also,  unlike  floating  point  instructions,  our  instructions  are 
indeterminate  in  duration  and  it  would  be  difficult  for  a  compiler  to  schedule 
parallel  execution. 

The  Prolog  coprocessor  must  interface  to  the  pipeline  of  the  SPUR  CPU.  To 
give  the  coprocessor  the  extra  time  that  certain  instructions  may  require,  we 
expand  the  pipeline  between  the  second  and  third  pipeline  stages.  The  coproces¬ 
sor  is  pipelined  in  the  same  way  as  the  CPU.  Figure  3  explains  the  coprocessor 
pipeline  graphically. 


I  R  ,  E  .  M  Hold  W 


Coprocessor 

Instructions  [  Hold  R  E  M  W 


CPU 

Instruction 


I  ,  Hold  ,  R  ,  M  W 


Figure  3.  Coprocessor  pipeline.  The  coprocessor  finds  the  extra  cycles  it  may 
need  to  execute  an  instruction  between  the  second  and  third  stages  of  the  SPUR 
pipeline.  By  using  this  extended  area  and  the  coprocessor  memory  access  control 
line  (an  extension  to  the  current  coprocessor  interface  of  SPUR)  we  ensure  that  no 
cache  access  conflicts  occur.  The  overlap  of  macro-instructions  in  the  SPUR  pipe¬ 
line  is  one  of  the  major  sources  of  improved  execution  time  for  SPUR  over  the 
PLM.  (Cycle  names:  I  -  instruction  fetch,  R  -  register  read,  E  -  intsruction  exe¬ 
cution,  M  -  memory  access,  W  -  register  write.) 
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An  important  consideration  mentioned  previously  is  that  coprocessor  state 
can  only  be  changed  after  all  memory  accesses  have  been  successfully  completed. 
To  meet  this  requirement,  a  set  of  memory  address  and  data  registers  are  added 
to  the  coprocessor  to  act  as  staging  areas  for  all  memory  transactions.  Once  the 
memory  accesses  are  all  completed,  the  contents  of  the  staging  registers  can  be 
moved  to  internal  registers.  This  is  the  primary  mechanism  for  insuring  instruc¬ 
tion  restartability  and  why  it  is  critical  that  instructions  not  be  interrupted  while 
updating  internal  state. 

6.2.4.  Expected  Performance  of  SPUR  with  a  Prolog  Coprocessor 

We  expect  the  performance  of  SPUR  with  a  Prolog  coprocessor  to  be  at  least 
as  high  as  the  PLM  but  with  smaller  and  less  complex  microcode.  In  Table  15  we 
see  that  execution  time  is  approximately  10%  better  than  the  PLM.  However,  the 
code  size  required  for  the  coprocessor,  although  much  less  than  for  SPUR  alone,  is 
still  a  factor  of  3.4  larger  than  the  PLM  (Table  16). 

In  summary,  many  instructions  have  been  eliminated  and  instruction  decode 
is  greatly  simplified.  All  instructions  are  4  bytes  with  standard  argument  formats. 
The  SPUR  pipeline  does  not  impart  high  overhead,  as  most  instructions  require  a 
register  read  or  write.  All  extra  micro-cycles  required  by  the  coprocessor  are 
available  by  suspending  the  CPU  pipeline  and  thus  ensuring  the  absence  of  cache 
access  conflicts. 

Our  claim  of  10%  better  performance  is  not  as  startling  as  it  may  seem.  The 
SPUR  macro-instruction  cycle  time  is  the  same  as  the  PLM  micro-instruction 
cycle  time.  The  PLM  micro-engine,  although  pipelined  does  not  exploit  macro- 
instruction  overlap.  SPUR’s  macro-instructions  provide  many  of  the  primitive 
operations  implemented  as  more  than  one  cycle  in  the  PLM.  SPUR  with  a  copro¬ 
cessor  is  faster  for  the  simple  instructions  and  is  only  slightly  slower  than  the 
PLM  for  the  recursive  unify  operations.  We  feel  that  this  claim  is  justified  since 
the  micro-engine  and  microcode  that  must  be  implemented  in  the  coprocessor  are 
a  subset  of  the  PLM’s  and  most  operations  are  performed  directly  in  SPUR 
instructions.  A  conservative  estimate  of  the  coprocessor’s  microcode  size  is  3.3KB, 
compared  to  PLM’s  17KB.  This  assumes  a  96-bit  wide  microinstruction  and  274 
microwords. 
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Comparison  ol  Execution  limes 

Instruction 

Freq 

PL 

Cycles 

M 

Weight 

SPUR 

Cycles 

-CoP 

Weight 

put_  value 

10.70 

3 

32.1 

1.5 

16.1 

unify,  variable 

8.80 

7 

61.6 

8 

70.4 

get.  list 

7.27 

10 

72.7 

7 

50.9 

unify.cdr 

6.88 

5 

34.4 

4.5 

31.0 

unify,  value 

4.96 

14.5 

71.9 

14.5 

71.9 

escapes 

4.90 

24.4 

switch,  on.  term 

4.87 

11 

53.6 

5 

unify,  nil 

4.86 

4 

19.5 

4.5 

21.9 

get.  structure 

4.11 

12 

49.3 

13.5 

55.5 

execute 

4.01 

1 

4.0 

2 

8.0 

allocate 

3.47 

11 

38.2 

5 

17.4 

get.  variable 

3.44 

2.5 

8.6 

1.5 

5.2 

unify,  constant 

3.33 

8 

26.6 

10 

33.3 

deallocate 

2.87 

6 

17.2 

4 

11.5 

put.  constant 

2.71 

2 

5.4 

2 

5.4 

proceed 

2.65 

1 

2.7 

4 

10.6 

try.  me.  else 

2.45 

20 

49.0 

20 

49.0 

call 

2.00 

1 

2.0 

4 

8.0 

cut 

1.85 

10 

18.5 

5.5 

10.2 

get.  constant 

1.83 

11 

20.1 

10 

18.3 

put.  variable 

1.79 

3.5 

6.3 

2.5 

4.5 

get.  value 

1.44 

13 

18.7 

20 

28.8 

trust,  me.  else 

1.32 

5 

6.6 

3 

4.0 

get.  nil 

1.29 

11 

14.2 

7 

9.0 

put.  unsafe,  value 

1.24 

10 

12.4 

8 

7.4 

retry,  me.  else 

0.88 

2 

1.8 

6 

5.3 

switch,  on.  structure 

0.87 

13 

11.3 

13 

11.3 

put.  list 

0.77 

3 

2.3 

2 

1.5 

try 

0.71 

20 

14.2 

22 

15.6 

fail 

0.56 

23 

12.9 

21 

11.8 

trust 

0.35 

5 

1.8 

5 

1.8 

unify,  void 

0.33 

6 

2.0 

16 

5.3 

switch,  on.  constant 

0.20 

10 

2.0 

13 

2.6 

retry 

2 

■D 

4 

0.2 

put.  structure 

4 

6 

0.3 

put.  nil 

0.01 

2 

WEM 

1 

Total  Weights 

694.1 

628.6 

Relative  Performance 

1.00 

0.91 

Table  15.  This  table  compares  performance  of  Prolog  on  the  the  PLM  and  SPUR  using  a 
specialized  Prolog  coprocessor.  The  figures  for  instructions  frequency  and  PLM  cycle  time 
are  summarized  from  [Dobry85]  using  more  up-to-date  values.  The  SPUR  implementation 
is  simple  macro-expansions  and  does  not  consider  what  could  be  achievable  by  shuffling  in- 
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struction  to  take  better  advantage  of  delayed  load  and  store  slots  or  other  optimizations. 


Comparison  of  Code  Size 

Instruction 

Freq. 

PL 

Bytes 

,M 

Weight 

SPUF 

Bytes 

t-CoP 

Weight 

put.value 

10.70 

3 

32.1 

4 

42.8 

unify.variable 

8.80 

2 

17.6 

8 

70.4 

get.  list 

7.27 

2 

14.5 

8 

58.2 

unify.cdr 

6.88 

2 

9.9 

12 

55.0 

unify. value 

4.96 

2 

9.9 

12 

59.5 

escapes 

4.90 

77.9 

switch  on.term 

4.87 

4 

19.5 

16 

unify,  nil 

4.86 

1 

4.9 

8 

38.9 

get.structure 

4.11 

6 

24.7 

24 

98.6 

execute 

4.01 

5 

20.1 

8 

32.1 

allocate 

3.47 

1 

3.5 

4 

13.9 

get.variable 

3.44 

3 

10.3 

4 

13.8 

unify.constant 

3.33 

5 

16.7 

20 

66.7 

deallocate 

2.87 

1 

2.9 

4 

11.5 

put.constant 

2.71 

6 

16.3 

8 

21.7 

proceed 

2.65 

1 

2.7 

16 

42.4 

try  .me.  else 

2.45 

5 

12.3 

16 

39.2 

call 

2.00 

6 

12.0 

16 

32.0 

cut 

1.85 

3 

5.6 

10 

18.5 

get.constant 

1.83 

6 

11.0 

20 

36.6 

put.variable 

1.79 

3 

5.4 

6 

10.7 

get.value 

1.44 

3 

4.3 

14 

20.2 

trust,  me.else 

1.32 

1 

1.3 

4 

5.3 

get.  nil 

1.29 

2 

2.6 

8 

10.3 

put.  unsafe,  value 

1.24 

3 

3.7 

4 

5.0 

retry,  me.  else 

0.88 

5 

4.4 

24 

21.1 

switch,  on.  structure 

0.87 

6 

5.2 

36 

31.3 

put.  list 

0.77 

2 

1.5 

8 

6.2 

try 

0.71 

5 

3.6 

24 

17.0 

fail 

0.56 

1 

0.6 

24 

13.4 

trust 

0.35 

5 

1.8 

12 

4.2 

unify,  void 

0.33 

2 

0.7 

16 

5.3 

switch,  on.  constant 

0.20 

6 

1.2 

36 

7.2 

retry 

0.06 

5 

0.3 

24 

1.4 

put.structure 

0.05 

6 

0.3 

20 

1.0 

put. nil 

0.01 

o 

0.0 

4 

0.0 

Total  Weights 

287.3 

989.3 

Relative  Code  Size 

1.00 

3.44 

Table  16.  This  table  compares  the  code  size  of  Prolog  programs  on  the  Berkeley  PLM 
and  on  SPUR  with  a  Prolog  co-processor.  The  SPUR  code  size  is  calculated  from  simple 
macro-expansion  of  the  instructions.  Constants  form  part  of  some  PLM  instructions.  For 
SPUR,  we  counted  the  fetching  of  these  constants  as  part  of  the  code  size  cost. 
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7.  Conclusions 

In  summary,  we  expect  that  a  macro-expansion  of  the  PLM  instruction  set 
will  run  Prolog  programs  on  SPUR  at  43%  of  the  speed  of  the  PLM  when 
memory  accesses  are  assumed  to  be  one  cycle.  SPUR  with  a  Prolog  coprocessor 
can  be  made  to  be  10%  faster  than  the  PLM  due  to  simplification  of  the  micro¬ 
engine  and  the  advantages  of  pipelining  across  instructions. 

When  cache  and  instruction  fetch  behavior  is  taken  into  account  we  suspect 
that  the  pure  SPUR  implementation  will  suffer  greatly.  Due  to  the  expanded  code 
size  of  the  SPUR  implementation,  the  SPUR  cache  will  have  to  be  4  to  8  times 
larger  than  one  used  for  the  PLM  in  order  to  obtain  the  same  miss  ratio.  Also 
SPUR  will  require  a  somewhat  larger  processor-to-memory  bandwidth  because  of 
the  expanded  code  size  and  tag  storage.  The  coprocessor  implementation  will 
remain  very  close  to  PLM  performance  due  to  the  much  more  similar  code  den¬ 
sity.  There  is  much  more  work  to  be  done  on  the  coprocessor  design  before  a 
definitive  statement  can  be  made. 

However,  we  feel  that  the  system  support  advantages  of  SPUR  make  it  com¬ 
petitive  with  ’ the  PLM  for  large  applications,  with  or  without  the  coprocessor. 
This  is  especially  true  when  one  considers  large  real  applications  that  involve  a 
large  amount  of  interactions  with  the  operating  system  for  I/O  or  are  floating¬ 
point  arithmetic  intensive.  The  utility  of  a  SPUR  Prolog  implementation  is  espe¬ 
cially  important  in  mixed  paradigm  programming  systems  where  only  a  part  of 
the  computations  would  be  in  the  logic  programming  paradigm.  SPUR  is  reason¬ 
ably  high-performance  for  Prolog  and  very  competitive  in  running  other 
languages.  The  PLM’s  special  hardware  and  loosely-coupled  coprocessor  model 
makes  running  mixed-language  applications  less  efficient. 

A  Prolog  coprocessor  for  SPUR  can  be  added  when  applications  demand  an 
improved  logic  programming  performance.  The  coprocessor  interface  changes 
required  to  support  microcoded  accelerators  are  minimal.  The  architecture  of  the 
coprocessor  is  a  hybrid  of  the  PLM  and  SPLR  architectures.  We  feel  that  a 
tightly-coupled  VLSI  Prolog  coprocessor  for  SPUR  is  a  viable  alternative  to  a  spe¬ 
cialized  loosely-coupled  Prolog  accelerator  such  as  the  PLM. 

The  bottom  line  of  this  study  is  that  SPUR  can  support  a  language  other 
than  Lisp  or  C  with  excellent  performance.  In  fact,  SPUR  would  place  third 
among  the  Prolog  implementations  listed  in  Table  17,  and  with  a  coprocessor,  it 
would  be  the  fastest. 
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Performance  Estimates  for  Logic  Programming  Systems 

Deterministic  Concatenate  Benchmark  (coni) 

Machine 

System 

Performance 
(in  LIPS) 

Reference 

Berkeley  SPUR 

coprocessor 

465,000 

estimate 

Berkeley  PLM 

(TTL)/Compiled 

425,000 

simulation 
(no  wait  states) 

Tick  &  Warren 

VLSI 

415,000 

Estimate,  Tick  &  Warren 

Aquarius  I 

(TTL)/Compiler 

305,000 

simulation  (NCR  bus) 

Berkeley  SPUR 

Macro-expansion 

184,000 

simulation 

Symbolics  3600 

Microcod  ed 

110,000 

estimate,  Tick  &  Warren 

DEC  2060 

Warren  Compiled 

43,000 

Warren 

Japan  5th  Gen  PSI 

Microcoded 

30,000 

estimate,  PSI  paper 

IBM  3033 

Waterloo 

27,000 

Warren 

DEC  VAX- 11/780 

Macrocoded 

15,000 

estimate,  Tick  &  Warren 

Sun-2 

Quintus  Compiler 

14,000 

Warren 

LMI/Lambda 

Uppsala 

8,000 

Warren 

DEC  VAX-11/780 

POPLOG 

2,000 

Warren 

DEC  VAX- 11/780 

M-PROLOG 

2,000 

Warren 

DEC  VAX- 11/780 

C-PROLOG 

1,500 

Warren 

Symbolics  3600 

Interpreter 

1,500 

Warren 

DEC  PDP-11/70 

Interpreter 

1,000 

Warren 

Z-80 

MicroProlog 

120 

Warren 

Apple-II 

Interpreter 

8 

Warren 

Performance  on  General  Benchmark  Programs 

Machine 

System 

Performance 
(in  LIPS) 

Reference 

Berkeley  SPUR 

Coprocessor 

225,000 

estimate 

Berkeley  PLM 

(TTL)/Compiled 

205,000 

simulation 

Berkeley  SPUR 

Macro-expansion 

89,000 

simulation 

LMI/Lambda 

Micro/Compiled 

12,400 

LMI  Corp. 

Japan  5th  Gen  PPC 

Microcoded 

10,000 

estimate,  NTIS  (#N83-31379) 

LM-2 

Microcoded 

9,500 

Prolog  Digest  v2.20 

LMI/Lambda 

Macro/Compiled 

6,200 

LMI  Corp. 

Symbolics  3600 

Microcoded 

5,000 

Prolog  Digest  v2.20 

LMI/Lambda 

Micro/Interpreter 

3,400 

LMI  Corp. 

LMI/Lambda 

Macro/Interpreter 

1,700 

LMI  Corp. 

Apple-II 

Pascal  Interpreter 

10 

Colmerauer 

Performance  on  the  Warren  Benchmarks 

(times  10  divide  10  log  10  ops8  palin25  query  list30  list50) 

Machine 

System 

Performance 
(in  LIPS) 

Reference 

Berkeley  SPUR 

Coprocessor 

163,000 

estimate 

Berkeley  PLM 

(TTL)/Compiled 

149,216 

simulation 

Berkeley  SPUR 

Macro-expansion 

60,000 

simulation  (excl.  iist30,50) 

LMI/Lambda 

Micro/Compiled 

12,400 

LMI  Corp. 

DEC  2060 

Warren  Compiled 

12,175 

Warren  thesis 

Table  17.  Performance  of  various  Prolog  implementations.  This  table  was 
adapted  from  [Dobry85j. 
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Appendices 

There  are  four  appendices.  Appendix  1  contains  listings  of  the  software  tools 
developed  to  macro-expand  PLM  instructions  to  8PUR  instructions  and  Appendix 
2  lists  the  actual  code  used  used  to  implement  each  PLM  instruction.  Appendix  3 
describes  the  code  for  macro-expansion  onto  SPUR  with  a  Prolog  coprocessor. 
The  numbers  in  Tables  15  and  16  were  derived  from  these  macro-expansions. 
The  final  appendix  is  an  outline  of  the  microcode  that  will  be  required  for  SPUR’s 
Prolog  coprocessor. 


-  40  - 


Special-  or  General-Purpose  Hardware  for  Prolog 


October  J.Q86 


Appendix  1:  Listings  of  Software  Tools 

This  appendix  contains  listings  of  the  two  software  tools  that  we  wrote  to 
allow  us  to  automatically  macro-expand  Prolog  programs  from  PLM  code  to 
SPUR.  The  first  program  is  preproc.c.  Its  purpose  is  to  put  the  PLM  instruc¬ 
tions  into  a  format  that  can  be  fed  to  the  C  preprocessor  and  to  set  up  the  con¬ 
stant  table.  The  second  program  is  postproc.c.  Its  purpose  is  to  turn  local 
labels  generated  by  the  macros  into  global  labels.  It  produces  output  suitable  as 
input  to  the  SPUR  assembler.  The  final  listing  shows  a  csh  command  file  to  con¬ 
vert  PLM  code  into  a  binary  file  that  can  be  run  on  the  SPUR  simulator. 

The  macro-expansion  process  uses  /lib/cpp.  The  input  to  cpp  is: 

headers. h 
defs.h 

instructions. h  (macro  definitions) 

PLM  code  preprocessed  with  preproc 

funcs.a 

trailers. h 

constants  table  (created  by  preproc) 

(The  files  headers. h,  defs.h,  instructions. h,  trailers. h  and  funcs.a  are  listed  in 
Appendix  2.)  The  output  from  cpp  is  post-processed  with  postproc  before  it  is 
assembled  and  linked  with  saa  and  sld. 
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r 

*  preproc.c - 

A  filter  that  takes  PLM  assembly  code  and  converts  it 
into  a  form  that  'lib  icpp  can  handle. 

*  l 

typedef  int  Boolean; 

^define  TRUE  1 
^define  FALSE  0 
^include  "list.h" 

#undef  NULL 
^include  <stdio.h> 

^include  <ctype.h> 

^include  <  strings. h> 

^define  CONST _  TABLE  _  START  44 
int  constTableOffset  =  CONST  _  TABLE  START- 

^define  MAX  _  CONST  _  OFFSET  0x1000 

^define  NIL_STR  (char  *>  NULL 

#define  LINE  _  SIZE  80 

typedef  enum  { 

LABEL, 

STRING, 

NUMBER, 

}  ConstType; 

typedef  struct  { 

List_  Links 
char 

ConstType 

int 

}  ConstRec; 

FILE  *constFilePtr  =  (FILE  *)  NULL; 

FILE  'oldFilePtr  =  stdin; 

FILE  'curFilePtr  =  stdin; 

typedef  struct  { 

List  _  Links 

char 
char 
int 

}  EscapeRec; 

typedef  struct  { 

List  _  Links 
char 
int 

}  InstrRec; 

^define  HASH_SIZE  137 

List  _  Links  instrHashTablef H ASH  _  SIZE!; 

List  _  Links  constHashTablefHASH  _  SIZE1; 

List_  Links  escapeHashTablefHASH  _  SIZE1; 

^define  HASH  _  FUNCfname,  len)  (Hen  +  ( namefO]  *  nameflen  -  1]))  %  HASH_SIZE) 

char  *SkipWhiteSpace( ); 
char  *FindWhiteSpace( ); 

InstrRec  * InstrHashFindf ); 
void  InstrHashlnsertl ); 


links;  -  . 
namefLINEjSIZE  +  11; 

(  *funcM); 


links; 

namefLINE  _  SIZE  +  11; 
commandfLINE  _  SIZE  +  11; 
offset; 


links; 

namefLINE  _  SIZE  +  1 1; 

type; 

offset. 
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ConstRec  'ConstHashFindO; 
ConstRec  'ConstHashlnsertO; 

EscapeRec  'EscapeHashFindl ); 
void  EscapeHashlnserti ); 


Main - 

Opens  the  input  file  and  calls  Preprocess  to  interpret 
the  file. 


mainlargc,  argv)  maid 

int  argc; 

char  *'argv; 


InitHashO; 
if  large  >  1)  { 

constFilePtr  =  fopen(argv(U,  "w"'); 
if  (constFilePtr  =  =  (FILE  *)  NULL)  { 

fprintf(stderr,  "Can't  open  %s\n",  argvfll); 
exit(l); 

} 

}  else  { 

fprintf(stderr,  "Missing  argument  for  constants  file An"); 
exit(l); 

} 

Preprocess(stdin); 


*  Preprocess 

* 

Reads  each  line  in  the  file  and  parses  it  into 
the  label,  instructio  and  argument  components. 

If  the  instruction  is  uahd.  a  instruction  —  dependent 

*  routine  is  called  to  process  it. 

*  / 


Preprocessi  filePtr) 
FILE  *  filePtr; 

{ 


char 

linefLINE  _  SIZE  +11; 

char 

'colon; 

char 

'end; 

char 

'linePtr; 

char 

*args; 

int 

status; 

int 

retStat; 

int 

len; 

InstrRec 

'instrRecPtr; 

status  = 

GetLine)  filePtr,  line); 

while  (status)  { 

if  (strlen(line)  =  =  0)  { 

status  =  GetLinei  filePtr.  line); 

continue; 

} 


Preprocess 


linePtr  =  line; 
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...Preprocess 

>ifc 

*  Look  for  an  optional  label  followed  by  before  white  space 

•it 

*  label: _ instr _ argl.arg'2 

*  '  —start  of  white  space 

*  |  —  colon 

*  This  scheme  fails  if  a  :  is  an  argument  with  white  space  before  . 


colon  =  indexdinePtr, 

end  =  FindWhiteSpacellinePtr); 

if  (colon  !=  NIL_STR  &&  (colon  <  end  |j  end  ==  NIL_STR))  { 

/* 

*  Found  label - print  it  on  a  separate  line 

printf("%.*s\n",  colon  —  line  +  1,  line); 

linePtr  =  SkipWhiteSpace(colon  +  1); 
if  linePtr  ==  NIL_STR)  { 

<'*  nothing  else  on  line  *  t 
status  =  GetLinedilePtr,  line); 
continue; 

I 

1 


/* 

*  Look  for  end  of  instruction  name 

.*  I 

end  =  FindWhiteSpacellinePtr); 
if  (end  ==  NIL_STR)  j 
/"’  no  white  space  ' 
len  =  strlenl  linePtr); 

}  else  { 

len  =  end  -  linePtr; 

I 

instrRecPtr  =  InstrHashFindl  linePtr,  lenr, 

if  (instrRecPtr  =  =  (InstrRec  *)  NULL)  { 

fpnntflstderr,  "Unknown  instruction:  ' nc s  n",  linePtr); 

1  ^f^end  ==  NIL_STR)  { 
args  = 

}  else  { 

args  =  SkipWhiteSpace(end  +  1); 

I 

retStat  =  (* instrRecPtr  —  >func)( linePtr,  len,  args); 
if  (IretStat)  { 

fprintffstderr,  "Bad  input:  %s\n",  line); 


status  =  GetLineifilePtr,  line); 


!* 

*  Instruction  Processing  Routines 

4: 

*  Converts  each  instruction  into  a  CPP  macro 


NoArgsF  unc 
DefaultFunc 

CallFunc 
VariableF  unc 


—  instructions  w  o  arguments. 

—  instructions  with  arguments  but  no  special 
processing  is  needed 

-  changes  the  name  of  the  call  instructions 

-  if  the  instruction  argument  is  a  register 
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then  convert  name  to  use  the  _reg  form 
else  convert  the  name  to  use  the  _var  form. 
ConstantF unc  —  the  instruction  uses  a  constant,  which  must 

be  recorded  in  the  constants  table. 

JumpFunc  —  records  the  label  in  a  jump—  type  instruction. 

SuntchOnConstantFunc  —  handles  switch  _on  ^constant. 

S w itchOnS true tF unc  —  handles  switch  _on  _structure. 

EscapeFunc  » ->* —  processes  escapes. 


NoArgsFunctline, 

char 

int 

char 


len.  args) 
’■'line; 
len; 
’args; 


{ 

printff"  %.*sl)'n",  len,  line); 

return(TRUE); 

}  - 


DefaultFuncfline,  len,  args) 
char  ’■'line; 

int  len; 

char  ’args; 


*  Add  ()  to  args. 

*  / 

printff"  %.*st%s)\n",  len.  line,  args); 

returmTRUE); 

} 


CallFunc!  line,  len,  args) 
char  ’line; 

int  len; 

char  ’args; 

! 

char  ’slash  =  index!  args, 
if  (slash  '=  NIL  _STR)  { 
’slash  = 


*  Convert  call  to  call  _proc,  fail  to  call_fail.  leave  procedure 

*  and  execute  alone. 

*  / 

if  (linefOl  ==  c)  { 

printff"  call  _  proc(ds))n",  args); 

}  else  if  (linefOl  =  =  T)  { 

printff"  call  _fail(fks)in".  argsi; 

}  else  if  (linefOl  ==  'p')  { 

printff"  procedure! rfs)'n",  args); 

}  else  if  (linefOl  ==  e'l  j 

printff’  execute! rt-s)\n",  argsi; 

1  else  { 

fpnntf(stderr,"Unknown  instruction  for  CallFunc:  d?s\n",  line); 

1 

returmTRUE); 


VariableFunctline,  len,  args) 
char  ’tine; 

int  len; 

char  ’args; 


NoArgsFunc 


DefaultF  unc 


CallFunc 


V ariableF  unc 
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{ 

if  (args  ==  NIL_STR)  { 
return(FALSE); 

} 


if  (indexfargs,  'Y') 
printfC 


return(TRUE); 

}  ' 


NIL  _  STR)  { 

<7o.*s_reg(<7os)\n",  len,  line,  args); 
%.*s_  var(%s)\n",  len,  line,  args); 


ConstantFuncdine,  len.  args) 
char  'line; 

int  len; 

char  'args; 

I 

char  vptr; 
int  offset; 

ConstType  type; 

if  (args  =  =  NIL  _ STR)  { 
return(FALSE); 

} 


*  See  if  instruction  is  unify  ^constant 

■it  • 

if  (linefOl  ==  u')  { 

if  (argsfOl  ==  '&')  { 

offset  =  AddConstl NUMBER,  args 4-1,  strlemargs)  —  1); 
printfC  unify  _constant_  number(%d)\n",  offset); 

}  else  { 

offset  =  AddConstfSTRING,  args,  strlentargs)); 
printfC  unifv  constant  string(.%d)\n",  offset); 

} 

}  else  { 

/* 

*  See  if  instruction  is  {get. put}  _  constant 

*  j 

ptr  =  L  +  index)  line,  'i; 
if  (''ptr  =  =  'c  '  i  { 

ptr  =  indexlargs,  V); 
if  (ptr  =  =  NIL _ STR)  ( 

returniFALSE);  missing  comma  *  ‘ 

} 

!* 

*  See  if  comma  is  itself  an  argument. 

*  If  not  the  first  argument  is  mtssing 

*  / 

if  ('ptr  =  =  argsfOl)  { 

ptr  =  indexlargs,  ',’); 
if  (ptr  =  =  NIL _ STR)  ( 
return!  FALSE): 

I 

offset  =  AddConsti  STRING,  ,  L); 
type  =  STRING. 

}  else  { 

if  (argsfOl  =  =  '&  'i  { 

offset  =  AddConsti  NUMBER,  args-t-1,  ptr  —  args  —  i 
type  =  NUMBER; 

}  else  { 

offset  =  AddConsti  STRING,  args,  ptr  -  args); 
type  =  STRING; 

1 


...VariableFunc 


ConstantFunc 
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...ConstantFunc 

} 

if  (type  ==  STRING)  { 

printft"  %."s_string(<7cd%s)\n",  len,  line,  offset,  ptr); 

}  else  { 

printfl"  £%.'*s_  number! ^d^sl'.n",  len,  line,  offset,  ptr), 

) 

}  else  j 

*  Instruction  is  {get, put}  _  structure 

*  ! 

ptr  =  index!  args.  V); 
if  (ptr  ==  NIL  _STR)  { 
return(FALSE); 

} 

offset  =  AddConsti  STRING,  args,  ptr  —  args); 
printff"  %.-*s(%d^s)\n",  len,  line,  offset,  ptr); 

I 

} 

return(TRUE); 


JumpFuncdine,  len,  argsi  J ILlTipF U.TXC 

char  ''line; 

int  len; 

char  "args; 

! 

int  offset; 

if  (args  =  =  NIL_STR)  { 
return*  FALSE); 

) 

offset  =  AddConsti  LABEL,  args,  strlemargs)); 
printfC  %.*si<^d)\n",  len,  line,  offset); 

return(TRUE); 


SwitchOnConstantFunclline,  len,  args)  S WltchOtlC OflStClIltF ILTIC 


char 

"line; 

int 

len; 

char 

"args; 

char 

tempi  LINE  _  SIZE  +  1 

int 

mask; 

int 

maskOffset; 

int 

i; 

int 

tableLen; 

int 

status; 

#define  MAX  _  ENTRIES  128 
struct  { 

int  const; 

char  labellLINE  _SIZE  +  11; 

ConstType  tvpe, 

|  entrylMAX  _  ENTRIES]; 

if  (sscanf(args,''%d,'\  &maski  !=  1)  { 

fprintfl stderr, ”S w itch;  missing  mask;  Gs.n",  argsi; 
return!  FALSE); 

} 

if  (mask  >  MAX  _  ENTRIES)  { 

fprintttstderr, "Switch:  table  too  big:  f»s\n",  args); 
return!  FALSE); 

) 

sprintf(temp.  "r7od”,  mask  +  1); 

maskOffset  =  AddConsti  NUMBER,  temp,  strlenitemp)); 
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...SwitchOnConstantFunc 

■.it 

*  Table  label  is  on  next  line  following  instruction 

*  Read  it  and  forget  it. 

*  / 

status  =  GetLinefcurFilePtr,  temp); 

i* 

*  Mask  is  i table  size  — 1 )  *2.  An  entry  is  on  2  lines. 


tableLen  =  (mask  4-  1)  / 2; 

for  (i  =  0;  i  <  tableLen;  i  +  + )  { 

status  =  GetLinelcurFilePtr.  temp); 
if  (tempfOl  =  =  'Si')  { 

entryfil. const  =  atoi(&(tempf  11)); 
entryfil.tvpe  =  NUMBER; 

}  else  { 

entryfil. const  =  AddConsti STRING,  temp,  strlemtemp)); 
entryfil.tvpe  =  STRING, 

I 

status  =  GetLinelcurFilePtr,  entryfil. label); 


pnntf("  switch _ on _ constant! %d,  C£d)\n",  maskOffset,  constTableOffseti; 

fprintffconstFilePtr, 

"  #  switch  on  constant! %d.  %d)\n”,  maskOffset,  constTableOffset): 
for  (i  =  0;  i  <  tableLen;  i+  +  )  { 
fprintffconstFilePtr,  "  long 
if  (entryfil. type  =  =  NUMBER)  { 
fprintffconstFilePtr,  "  long 
}  else  { 

fprintffconstFilePtr,  "  long 

I 

fprintffconstFilePtr,  "  long 

} 

constTableOffset  +  =  12  *  tableLen; 
fprintffconstFilePtr,  "'n"), 

if  i constTableOffset  >  MAX  _CONST_OFFSET)  { 

fprintffstderr,  "Warning:  constant  table  overflowin''); 
fprintffconstFilePtr,  "  #  Warning:  constant  table  overflowin’"); 

} 


%d\n",  entryfil. const); 

const  _  num  _typem"); 
const  _  tvpein"); 

%s \n",  entryfil. label); 


return!  TRUE); 

} 

SwitchOnStructFuncfline,  len,  args)  S U)itchO  TlStriLCtF UtlC 

char  'line; 

int  len; 

char  'args; 

{ 

char  tempfLINE  _  SIZE +  11; 

int  mask; 

int  maskOffset; 

int  i; 

int  tableLen; 

int  status; 


^define  MAX  _  ENTRIES  128 
struct  { 

int  const; 

char  labelfLINE  _SIZE  +  11; 

1  entryfMAX  _  E  NTRIE  S 1 ; 

if  isscanffargs/'^d,",  &mask)  ’=  1)  j 

fprintffstderr, "Switch:  missing  mask:  rfsin",  args); 
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...SwitchOnStructFunc 

return(FALSE); 

} 

if  (mask  >  MAX _ ENTRIES)  { 

fprintffstderr, "Switch:  table  too  big:  %s\a'\  args); 

return  FALSE); 

} 

sprintf(temp,  "%d",  mask  +  1); 

maskOffset  =  AddConsti NUMBER,  temp,  strlen(temp)); 


'*  Table  label  is  on  next  line  following  instruction . 

*  Read  it  and  forget  it. 

-it  i 
l 

status  =  GetLineicurFilePtr.  temp); 

I* 

*  Mask  is  stable  sue  —  1  >  *2.  An  entry  is  on  2  lines. 


tableLen  =  (mask  +  I)  / 2; 

for  (i  =  0;  i  <  tableLen;  i  +  + )  { 

status  =  GetLineicurFilePtr.  temp); 

entryfil. const  =  AddConsti  STRING,  temp,  strlenitempi); 

status  =  GetLineicurFilePtr,  entryfil. label); 

) 


printff"  switch  _  on  _ structure! °cd,  N>d)\n",  maskOffset,  constTableOffset); 

fprintffconstFilePtr, 

#  switch  on  structure!  %d,  N>d)\n",  maskOffset,  constTableOffset); 
for  (i  =  0;  i  <  tableLen;  i++)  { 

fprintffconstFilePtr,  "long  %d\n",  entryfil. const); 

fprintffconstFilePtr,  ".long  %s\n",  entryfil. label); 

constTableOffset  +  =  8  *  tableLen; 
fprintffconstFilePtr,  "\n"); 

if  (constTableOffset  >  MAX  _ CONST _ OFFSET)  j 

fprintffstderr,  "Warning:  constant  table  overflowin''); 
fprintffconstFilePtr,  "  #  Warning:  constant  table  overflowin''); 


returntTRUE); 


peFuncdine, 

len,  args) 

char 

Nine; 

int 

len; 

char 

"args; 

EscapeRec 

'ptr; 

char 

newArgf  LINE  _  SIZE  +  11; 

char 

'comma; 

comma  =  i 

indexlargs, 

if  (comma 

=  =  NIL_STR)  { 

EscapeFunc 


else 


strcpylnewArg,  argsi; 

! 

strncpytnewArg,  args,  comma  —  args); 
newArgf  comma  — args|  =  \0‘; 


ptr  =  EscapeHashFindfnewArgj; 
if  iptr  ==  lE.scapeRec  *)  NULL)  { 

printf!"  #  ESCAPE! rf>s,Tl)\n",  args); 

fprintffstderr, "Unknown  escape:  rfs\n",  args); 

}  else  { 
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...EscapeFunc 

printfl"  Qs\n",  ptr-  >command); 

} 

returniTRUE); 


AddConst - 

Adds  a  constant  to  the,  constant  table  if  it  was  not 
already  in  it. 


AddConst!  type,  name,  len) 

ConstType  type; 
char  ‘name; 
int  len; 

{ 

ConstRec  'ptr; 

while  Inameflen  —  11  =  =  ||  nameflen  —  11  =  =  t  i  { 

len - ; 

} 

ptr  =  ConstHashFindltvpe,  name,  len); 
if  (ptr  =  =  (ConstRec  *)  NULL)  { 

ptr  =  ConstHashlnsertitype,  name,  len); 

1 

returnlptr  —  > offset); 

} 


InitHashO 

{ 

int  i; 

for  (i  =  0;  i  <  HASH  _  SIZE;  i  +  +  )  { 
List  _  Initl&linstrHashTablefiD); 

List  _  Init(&(constHashTable[iD); 
List  _  Initl&lescapeHashTablefil)); 

I 

InstrHashInsert("end", 

InstrHashlnserti  "allocate", 
InstrHashlnsertl  "cut", 

InstrHashlnserti  "deallocate", 
InstrHashlnsertl  "proceed". 

InstrHashlnserti  "quit", 

InstrHashlnsertl  "unify  _  nil". 


NoArgsFunc); 

NoArgsFunc); 

NoArgsFunc); 

NoArgsFunc); 

NoArgsFunc); 

NoArgsFunc); 

NoArgsFunc); 


InstrHashInsert("get  _  list", 
InstrHashlnsertl  "get  _  nil", 
InstrHashlnsertl  "mark", 
InstrHashlnsertl  "pause", 
InstrHashlnsertl  "put  _  list", 
InstrHashlnsertl  "put  _  m  1", 
InstrHashlnsertl "sw itch _  on  _  term", 
InstrHashlnsertl  "trust  _  me  _  else”, 
InstrHashInsert("unify  _  void", 
InstrHashlnsertl  "put  _  unsafe  _  value". 


DefaultFunc) 

DefauitFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc) 

DefaultFunc); 


InstrHashlnsertl  "switch  _  on 
InstrHashlnsertl  "switch  _  on 
InstrHashlnsertl  "procedure", 
InstrHashlnsertl  "call", 
InstrHashlnsertl  "fail", 
InstrHashlnsertl  "execute", 
InstrHashlnsertl  "escape", 


constant",SwitchOnConstantFunc) 

structure".SwitchOnStructFunc); 

CallFunc); 

CallFunc); 

CallFunc); 

CallFunc); 

EscapeFunc); 


InstrHashlnsertl  "get  _  variable", 


VariableFunc); 


AddConst 


InitHash 
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InstrHashlnsertC'get  _  value", 
InstrHashInsert("put_  variable", 
InstrHashInsert|"put_  value", 
InstrHashlnsertC'umfy  _  cdr", 
InstrHashlnsertC'umfy  _  value", 
InstrHashlnsertC'umfy  _  unsafe  _  value", 
InstrHashlnsertl  "unify  _  variable", 

InstrHashInsert("get_  constant", 
InstrHashInsertl"get  _  structure" , 
InstrHashInsert("put_  constant", 
InstrHashInsertl"put  _  structure" , 
InstrHashlnsertC'umfy  _  constant", 

InstrH  ashlnsertl  "cutd" , 

InstrHashlnsertl  "retry  _  me  _  else", 
InstrHashlnsertl  "retry", 

InstrHashlnsertl  "try  _  me  _else", 
InstrHashlnsertl  "try" , 

InstrHashlnsertl  "trust", 

EscapeHashlnserti "  <  ”, 
EscapeHashInsert("<  2", 
EscapeHashlnserti  "<  =  ", 
EscapeHashlnserti "  =  ", 
EscapeHashlnserti"  =  :  =  /  2", 
EscapeHashlnserti  "  =  <  12", 
EscapeHashlnserti "  > 

EscapeHashlnserti "  >  / 2", 
EscapeHashlnserti  ">  =", 
EscapeHashlnserti  ">  =  /2", 
EscapeHashlnserti  "access", 
EscapeHashlnserti  "access  12". 
EscapeHashlnserti  "integer", 
EscapeHashlnserti  "integer  / 1", 
EscapeHashlnserti  "is", 

EscapeHashlnserti  "is  / 4", 
EscapeHashlnserti  "is  /2", 
EscapeHashlnserti  "nl", 

EscapeHashlnserti  "nl  10", 
EscapeHashlnserti  "set", 
EscapeHashlnserti  "set  12 ", 
EscapeHashlnserti  "var", 
EscapeHashlnserti  "var  / 1  ” , 
EscapeHashlnserti  "write", 
EscapeHashlnserti  "write  II", 

I* 

*  These  escapes  aren't  handled  yet. 

EscapeHashlnserti"  =  \-  12". 
EscapeHashlnse  rtl  "asserta" , 
EscapeHashlnserti"  assertz" , 

EscapeHashl  nsertl  "call", 
EscapeHashlnserti”  retracta" , 

EscapeHashl hse rtl  "retractp". 

*  / 

1 


VariableFunci; 

VariableFunc); 

VariableFunc); 

VariableFunc): 

VariableFunc); 

VariableFunc): 

VariableFunc); 

ConstantFunc); 

ConstantFunc); 

ConstantFunc); 

ConstantFunc); 

ConstantFunc); 

JumpFunc); 

JumpFunc); 

JumpFunc); 

JumpFunc); 

JumpFunc); 

JumpFunc); 

"less  _  than! )'"); 

"less  _  than!)"); 

"less  _  than  _  or  _  equal! )"); 
"equal!)"); 

"equal!)”); 

"less  _  than  _  or  _  equal! )") . 
"greater _  than! )"); 

"greater  _  than! )"); 

"greater  _  than  _  or  _  equal! )") 
"greater  _  than  _  or  _  equal! )"): 
"access! )"); 

"access!)"); 

"integer!)"); 

"integer!)"); 

"is  _  escape!)"); 

"is  _  escape!)"); 

"is  _2  _escapei )"); 

"escape!  NL.Tl)"); 

"escape!  NL.Tl)"); 

"setter!)"); 

"setter!)"); 

"var  _  escape! )"); 

"var  _  escape!)"); 

"escape!  WRITE. Tl)”); 

"escape!  WRITE.Tl)"); 


— 

- 

- "); 

Do  bv  hand  for  now"); 
" - ”); 


...InitHash 


*  Hash  Routines:: 

$ 

*  Insert  and  Find  routines  for  instructions,  constants  and  escapes. 
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InstrRec  * 

InstrHashFindlname.  len) 
char  'name; 

int  len; 

{ 

int  hashld; 

register  InstrRec  'instrRecPtr; 

hashld  =  HASH  _FUNC(  name,  len); 

LIST_FORALL(&instrHashTable[hashId],  iList_Links  *)  instrRecPtr)  { 
if  istrncmplinstrRecPtr  —  > name,  name,  len)  =  =  0)  { 
return)  instrRecPtr); 
break, 


returnU InstrRec  *)  NULL); 

I 


void 

InstrHashlnsertlname,  func) 
char  *name; 

int  (■'func)l); 

{ 

int  hashld; 

InstrRec  'instrRecPtr; 

hashld  =  HASH  _FUNC( name,  strleniname)); 
instrRecPtr  =  (InstrRec  ')  callocll,  sizeofdnstrRec)); 
if  (instrRecPtr  =  =  (InstrRec  ')  NULL)  { 

fprintf(stderr,  "Calloc  failed  in  InstrHashInsert\n"V, 
exit(l); 


strcpy! instrRecPtr  —  > name  ,  name); 
instrRecPtr  —  >func  =  func; 

List  _  Insert!  I  List  _  Links  ')  instrRecPtr, 

LIST  ATFRONTl&instrHashTablefhashldD): 


ConstRec  * 
ConstHashFind(type, 
ConstType 

char 

int 


name,  len) 
type; 
'name; 
len; 


register  ConstRec 

int 


'constRecPtr; 

hashld; 


hashld  =  HASH  _  FUNC  (name,  len); 

LIST  _  FORALLl&constHashTablefhashldl,  i  List  _  Links  ''  constRecPtr)  { 
if  ((strncmp(constRecPtr  -  >  name,  name,  len)  =  =  0)  && 
(constRecPtr- > type  ==  type))  { 
return!  constRecPtr); 
break; 


returnKConstRec  ')  NULL); 

1 


ConstRec  ' 

ConstHashlnsertltype,  name,  len) 
ConstType  type; 

char  'name; 


InstrHashF  ind 


InstrHashlnsert 


ConstHashF  ind 


ConstHashlnsert 
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...ConstHashlnsert 

int  len; 

{ 

int  hashld; 

ConstRec  ’constRecPtr: 

int  i; 

hashld  =  HASH  _FUNC(name,  len); 

constRecPtr  =  (ConstRec  *)  callocd,  sizeoffConstRec)); 

if  (constRecPtr  =  =  (ConstRec  *)  NULL)  { 

fprintf(stderr,  "Calloc  failed  in  ConstHashlnsert'.n"'; 
exit(l); 

} 

strncpy(constRecPtr—  >name,  name,  len); 
constRecPtr— >nameflenl  =  O'; 

constRecPtr— >  type  =  type: 
constRecPtr—  >  offset  =  constTableOffset; 

List _Insert((List_ Links  *)  constRecPtr, 

LIST  _  ATFRONTi  &constHashTabie[hashId])); 

if  (type  =  =  LABEL)  { 

fprintffconstFilePtr,  ".long  r/r  's  #  label,  offset  =  %d\n\n". 

len,  name,  constTableOffset); 
constTableOffset  +  =  4; 

}  else  if  (type  =  =  STRING)  { 

fprintffconstFilePtr,  "  #  string  offset  =  f?;d\n". 

len,  name,  constTableOffset); 
for  (i  =  0;  i  <  len;  i++)  { 

fprintf(constFilePtr,  "  long  'v.ld  #  '%c'\n",  namefi],  namefil); 

} 

fprintffconstFilePtr,  "  long  0\n\n"); 
constTableOffset  +  =  4  *  (len  +  1); 

}  else  { 

fprintffconstFilePtr,  ".long  ^.’s  #  number,  offset  =  ^edinW, 

len,  name,  constTableOffset); 
constTableOffset  +  =  4; 

} 

if  (constTableOffset  >  MAX  _ CONST _ OFFSET)  { 

fprintffstderr,  "Warning:  constant  table  overflows"); 
fprintffconstFilePtr,  "  #  Warning;  constant  table  overflowin'1); 

} 

return(constRecPtr); 


EscapeRec  * 

EscapeHashFind(name) 

char  "name; 

{ 

int  hashld; 

register  EscapeRec  *ptr; 

hashld  =  HASH  _ FUNCf name,  strlenlname)); 

LIST_  FORALLl&escapeHashTablefhashldl,  ‘  List _ Links  *)  ptr)  j 

if  (strcmpt  ptr —>  name,  name)  =  =  0)  { 

returntptr); 

break; 


EscapeHashFind 


returnff EscapeRec  *)  NULL), 


void 

EscapeHashlnsertl name,  command) 
char  ’name; 
char  ’command; 


EscapeHashlnsert 
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...EscapeHashlnsert 

{ 

int  hashld: 

EscapeRec  *ptr; 

hashld  =  HASH_FUNC(name,  strlentname)); 
ptr  =  (EscapeRec  *)  callocll,  sizeof(EscapeRec)); 
if  (ptr  =  =  (EscapeRec  *)  NULL)  { 

fprintf(stderr,  "Calloc  failed  in  EscapeHashlnsert\n''); 
exit(l); 

1 

strcpylptr—  >name  ,  name); 
strcpy(ptr—  >command  ,  command); 

List _ Insert! (List  _  Links  *)  ptr,  LIST  _  ATFRONTf&escapeHashTablefhashld])); 
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GetLine - 

Gets  a  line  from  the  file  that  does  not  start  with  a  comment 
character  (#).  The  line  is  null  -  terminated  and  the  first 
newline  '\n'  is  set  to  "'O'. 


*  Result: 

*  0  There  was  an  error  or  EOF  condition  reading  the  line. 

*  1  The  line  was  read  succesfully. 


GetLine(  filePtr,  buffer) 

FILE  *filePtr; 
char  'buffer; 

{ 

char  'status; 

char  linefLINE  _  SIZE  + 1 1; 
int  i; 
int  j; 

/* 

*  Skip  the  line  if  it  begins  with  a  comment  character 

*  / 

do  j 

status  =  fgetsdine,  LINE  _  SIZE,  filePtr); 
if  (status  =  =  NIL_STR)  { 

return(O);  /*  error  or  EOF  ' 

I 

|  while  (IinefO]  ==  '#')  ; 


/* 

*  Trim  leading  white  space  while  copying  to  the  outpur  arg 

*  Convert  the  \n  '  if  there  is  one)  to  a  null  character. 

.y 

i  =  0; 

whiled  <  LINE_SIZE  &&  ( linef il  ==  ||  linefi]  ==  '\t’)>  j 

i  +  +  ; 

} 

j  =  0; 

bufferlj]  =  '\0‘; 

while((line[i]  !=  "n")  &&  dinefil  !=  '\0'))  { 
bufferfj]  =  linef  il; 
i+  +; 

j  +  +  ; 

} 

bufferfj]  =  'O'; 

if  (bufferfOl  ==  { 

FILE  'filePtr; 

filePtr  =  fopenl&ihutferf  L]),  "r"); 
if  (filePtr  ==  'FILE  *)  NULL)  { 

fprintftstderr,  "Can  t  open  %s  for  reading'.n",  &(bufferf  11)); 

1  else  { 

oldFilePtr  =  curFilePtr; 
curFilePtr  =  filePtr; 

Preprocessi  filePtr); 
curFilePtr  =  oldFilePtr; 


GetLine 
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...GetLine 

} 

bufferfO]  =  'O'; 

I 

return)!.); 

} 


*  SkipWhiteSpa.ee 

*  FindWhiteSpace 

-is 

*  Routines  to  skip  over  white  space  or 

*  skip  over  non  —  white  space  in  a  line. 

*  / 


Skip  WhiteSpace(  string)  SkipW  hltsS  pClCS 

char  ’string; 

{ 

register  char  *s  =  string, 
while  (1)  { 

if  (*s  ==  '  ||  *s  ==  ' '  t '  >  { 

s  4-  + ; 

}  else  if  (*s  =  =  '  O')  { 
return)  NIL  _  STR); 

} 

return  (s); 

} 

1 


FindWhiteSpace)  string)  FindWhiteSpace 

char  ’string; 

{ 

register  char  *s  =  string; 
while  (1)  { 

if  (*s  ==  '  '  ||  *s  ==  '\t')  { 
return  is); 

}  else  if  i*s  ==  "0  )  { 
return)  NIL  _  STR); 

I 

s+  +; 
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Postproc.c - 

A  filter  to  convert  the  output  of  the  CPP  into  a  form 
that  the  SPUR  assembler  can  handle.  Local  labels  are 
resolved  into  unique  names 


^include  <stdio.h> 

^include  <ctype.h> 

^define  TRUE  1 
#define  FALSE  0 
typedef  int  Boolean; 

#include  "list.h" 

int  lineNum  =  1; 

int  labelNum  =  1; 

typedef  struct  { 

List  _  Links 
char 
int 

Boolean 
}  LabelRec; 

^define  HASH  _  SIZE  101 

List_  Links  hashTablefHASH  _SIZE1; 

^define  HASH  _  FUNC(name)  ' 

UnamefOl  *  namefstrleni  name)  —  il)  %  HASH_SIZE) 

I* 

*  HashFind - 

# 

*  Find  and  retrieve  the  record  for  a  given  label  in  the  hash  table. 


links; 

namef80]; 

number, 

forwardRef; 


LabelRec 
HashFind!  name) 

char  !name; 

! 

int 

register  LabelRec 

Boolean 


hashld; 
UabelRecPtr; 
found  =  FALSE; 


hnshld  =  HASH  _  FUNC(name); 


UST_  FORALL(&hashTablelhashIdl,  (  List  _  Links  *)  labelRecPtr)  { 
if  ( strcmpt labelRecPtr -> name,  name)  =  =  0)  { 
found  =  TRUE; 
break; 


HashFind 


if  (’found)  j 

return)  (  LabelRec  ')  NULL); 

}  else  j 

return)  labelRecPtr); 
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*  Hashlnsert - 

*  Insert  a  label  record  in  the  hash  table. 


void 

Hashlnsert(labelRecPtr) 

LabelRec  "labelRecPtr: 

{ 

int  hashld; 


hashld  =  HASH  _  FUNCdabelRecPtr —>  name); 


List  _ Insert! (List  _  Links  *)  labelRecPtr. 

LIST  _  ATFRONT(&hashTablef  hashld])) 


HasfjJnsert 


*  HashDelete - 

A* 

*  Delete  a  label  record  from  the  hash  table. 

•S' 

*  j 

void 

HashDelete(labelRecPtr) 

LabelRec  '■'labelRecPtr; 

I 

List  Removed  List  Links  *)  labelRecPtr), 


HashDelete 


i* 

*  LabelProcess - 

■* 

*  Resolves  forward  and  backward  label  references. 

*  Labels  are  stored  in  a  hash  table. 


LabelProcessO 

! 

char 
char 
int 

LabelRec 
c  =  getcharO; 

while  (c  !=  ' f ’  &&  c  '=  b'  &&  c  !=  ':')  { 

if  (!isalpha{c)  &&  !isdigit(c)  &&  c  !=  { 

fprintffstderr,  "Malformed  label  at  line  %d\n",  lineNum); 
exit(l); 

I 

namefil  -  c; 
i+  +; 

c  =  getcharl); 

I 


name[80|; 

c; 

i  =  0; 
"labelRecPtr; 


LabelProcess 


namefi]  =  \0'; 


/* 
*  / 


fprintflstderr,  "Label  %s\n".  name): 

labelRecPtr  =  HashFindl  name); 
if  i labelRecPtr  !=  NULL)  { 
switch  (c)  { 
case 
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if  (labelRecPtr  —  >  forwardRef)  { 

labelRecPtr— >forwardRef  =  FALSE; 
}  else  { 

labelRecPtr— >  number  =  labelNum: 
labelNum  +  +; 

} 

printf("Z%d:",  labelRecPtr  —  >  number); 

break; 

case  f': 

if  (UabelRecPtr— >  forwardRef)  { 

labelRecPtr—  >forwardRef  =  TRUE; 
lahplRecPtr—  >  number  =  labelNum; 
labelNum  +  +; 

} 

printfCZf&d",  labelRecPtr  —  >  number); 
break; 
case  b'; 

printf("Z%d",  labelRecPtr  —  >  number); 

break; 


}  else  { 

if  i  c  -  =  b  ' )  { 

fprintflstderr,  "Undefined  label  at  line  ^dVn",  lineNum); 
exit!  1); 


labelRecPtr  =  (LabelRec  *)  malloc<sizeof(LabelRec)); 
labelRecPtr— >  forwardRef  =  FALSE; 
strcpyl  labelRecPtr—  >name,namei; 

Hashlnsertl  labelRecPtr); 
switch  (c)  { 
case  V: 

labelRecPtr—  >forwardRef  =  FALSE; 
labelRecPtr- > number  =  labelNum; 
labelNum  +•  +■ ; 

printf("Z%d:",  labelRecPtr  —  >  number); 

break; 
case  T: 

labelRecPtr  —  >  forwardRef  =  TRUE; 
labelRecPtr ->  number  =  labelNum; 
labelNum  +  +■ ; 

printf(”Z%d",  labelRecPtr  -  >  number); 

break; 


/* 

*  Mam - 

Scans  through  the  file,  and  applies  the  following  mappings: 

*  ->  "n' 

T  ->  ■#' 

’(wlabeT  —  >  Calls  LabelProc  with  label 


main!) 

{ 

char  c; 

int  i; 

Boolean  justHadNL  =  TRUE; 

for  fi  =  0;  i  <  HASH  _  SIZE;  i  +  +• )  { 
List  _  Initl&ihashTablefil)); 

} 


...LabelProcess 


main 
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...main 


c  =  getcharO; 

while  (c  !=  EOF)  { 

if  (c  =  =  '\n  ’)  { 
lineNum+  +  : 
if  (IjustHadNL)  { 
putchartc); 

} 

justHadNL  =  TRUE, 
c  =  getchart ); 

continue; 

} 

justHadNL  =  FALSE, 

if  (c  =  =  ' ; ' )  { 

putchari  -'-n  ’ ); 

}  else  if  i  c  =  =  '  #  '  >  { 
putchart  '  '); 
putchari  c); 

}  else  if  i  c  =  =  ’ ! " )  { 
putchari 

}  else  if  i c  =  =  ’(w')  \ 
LabelProcessi ); 

}  else  ( 

putchari  c); 

} 

c  =  getcharO; 
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'•*  list.c  - 

*  This  file  contains  procedures  for  manipulating  lists. 

*  Structures  may  be  inserted  into  or  deleted  from  lists,  and 

*  they  may  be  moved  from  one  place  in  a  list  to  another. 

•is 

*  The  header  file  contains  macros  to  help  in  determining  the  destination 

*  locations  for  List _Insert  and  List _Move.  See  list.h  for  details. 

*  Copyright  iC)  1985  Regents  of  the  University  of  California 

*  All  rights  reserved. 


#ifndef  lint 

static  char  rcsidfl  =  "$Header:  List.c.v  1.3  86  / 02  /22  14:26:31  nelson  Exp  $  SPRITE  (Berkeley)"; 
#endif  not  lint 

^include  "list.h" 
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*  List  _Insert - 

*  Insert  the  list  element  pointed  to  by  itemPtr  into  a  List  after 

*  destPtr. 

*  Results: 

*  No  value  is  returned. 


*  Side  effects: 

*  The  list  containing  destPtr  is  modified  to  contain  itemPtr. 


void 

List  _  Insert!  itemPtr,  destPtr) 

register  List  _  Links  'itemPtr; 

register  List  _  Links  'destPtr; 


List-Insert 

structure  to  insert  ' 

structure  after  which  to  insert  it  " 


itemPtr— >  nextPtr  =  destPtr—  >  nextPtr; 
itemPtr  —  >prevPtr  =  destPtr; 
destPtr— >  nextPtr  -  >prevPtr  =  itemPtr; 
destPtr  —  >  nextPtr  =  itemPtr; 
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*  List  ^Remove - 

-•* 

*  Remove  a  list  element  from  the  list  in  which  it  is  contained. 

*  Results: 

*  No  value  is  returned. 

4<r 

*  Side  effects: 

*  The  given  structure  is  removed  from  its  containing  list. 


void 

List  _  Remove!  itemPtr) 

register  List  _  Links  *itemPtr;  ■'*'  list  element  to  remove  * 

if  (itemPtr  ==  itemPtr—  > nextPtr)  { 

return; 

itemPtr- >prevPtr- >  nextPtr  =  itemPtr ->  nextPtr; 
itemPtr- > nextPtr- >prevPtr  =  itemPtr- >prevPtr; 

} 


List-Remove 
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i. a 


*  List _Move - 

*  Move  the  list  element  referenced  by  itemPtr  to  follow  destPtr. 

*  Results: 

*  No  value  is  returned. 

■it 

*  Side  effects: 

*  List  ordering  is  modified. 

•*' 

.*  _  _  _  _  _ _ _  _  _1  _  _  _  _  _  __  _  _ _ _  _  _ _ _  _  _  _ 

*  / 

void 

List  _  Move! itemPtr,  destPtr) 

register  List  _  Links  *itemPtr;  ■'*  list  element  to  be  mooed  *  ' 

register  List  _  Links  *  destPtr;  /*  element  after  which  it  is  to  be  placed  *./ 

{ 

/* 

*  It  is  conceivable  that  someone  will  try  to  move  a  list  element  to 

*  be  after  itself. 

-it  j 

if  (itemPtr  =  =  destPtr)  { 

return; 

1 

List  _  Remo  vet  itemPtr); 

List  Insert! itemPtr,  destPtr); 


List-Move 
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*  List _Init - 

■it 

Initialize  a  header  pointer  to  point  to  an  empty  list.  The  List  _Links 
structure  must  already  be  allocated. 

■it 

*  Results: 

*  No  value  is  returned. 

*  Side  effects: 

*  The  header's  pointers  are  modified  to  point  to  itself. 


■a  / 

void 

List  _  Ini  t(headerPtr) 

register  List _ Links  *headerPtr;  /*  Pointer  to  a  List _Links  structure 

to  be  header  * 


headerPtr  —  >  nextPtr  =  headerPtr; 
headerPtr— >prevPtr  =  headerPtr; 


List-Init 
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#/  ibin  icsh  -f 
# 

#  A  script  to  "compile"  PLM  code  into  a  binary  that  runs  on 

#  the  SPUR  simulator  (barb). 

# 

onintr  end 

set  nonomatch 

set  DIR  =  "andrew  spur 

set  POST  =  $DIR  Proc  postproc 

set  PRE  =  $DIR  /Proc  /preproc 

set  HEADERS  =  $DIR  Headers 

set  INCLUDE  =  "—  ISHEADERS  -P" 

set  BARB  =  'zorn  / sim  "barb 

set  CODE2  =  'zorn  /sim  /barb  th  code2.s 

set  SAS  =  $BARB  /'sas  'sas 

set  SLD  =  SBARB  /sas  sld 

@  numargs  =  $#argv 
set  Preproc  =  0 
set  Postproc  =  0 
set  Assemble  =  0 
set  Load  =  0 

top: 

switch 

case  "-help": 

echo  "Options:  —  pre  ==  preproc" 
echo  "  -post  ==  postproc" 

echo  "  -  load  =  =  load" 

echo  "  -asm  ==  asssemble" 

goto  end 
breaksw 
case  "-pre": 
shift 

set  Preproc  =  1 
goto  top 
breaksw 
case  "  —  post": 

shift 

set  Postproc  =  1 
goto  top 
breaksw 
case  "  —  asm": 
shift 

set  Assemble  -  L 
goto  top 
breaksw 
case  "  -  load": 

shift 

set  Load  =  I 
goto  top 
breaksw 
default: 

set  longname  =  $i 
set  name  =  "'basename  $L:t  w  " 
if  <$numargs  =  =  1)  then 
set  Preproc  =  I 
set  Postproc  =  1 
set  Assemble  =  1 
set  Load  =  L 
endif 
breaksw 

endsw 

if  i$Preproc  =  =  L)  then 
echo  "Pre-process:" 
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rm  —  f  Sname.a  Sname.spur 
cp  $HEADERS  /header. h  Sname.spur 
$PRE  $name. const  <  $longname  >  > 
cat  $HEADERS  /trailer. h  $name, const 
rm  —  f  $name. const 

endif 

if  (SPostproc  ==  1)  then 
echo  "Post  — process:1' 
cat  $name.spur  |  /lib  /cpp  SINCLUDE 

endif 

if  ($Assemble  =  =  1)  then 
echo  "Assemble:" 
rm  -f  temp  $name.s  Sname.th 
$SAS  <  $name.a  >  Iname.s 
cat  SCODE2  {names  >  temp 
as  —o  temp2  temp 

endif 

if  ($Load  ==  1)  then 
echo  "Load:" 

$SLD  —o  $name.th  temp2 

endif 

end: 


$name.spur 
>  >  Sname.spur 


$POST  >  $name.a 
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#include  " instructions . h" 
#include  "defs.h" 


*  Initialize  code: 

* 

V 

. org  0x3000 

/* 

*  Turn  off  tag  traps. 

V 

wr_special  upsw,  rO,  $0x880 


*  Initialize  all  of  the  registers. 

V 

add  C0NST_PTR ,  rO,  $0x780 

sll  C0NST_j?TR ,  C0NST_PTR,  $3 

sll  CONST _PTR ,  C0NST_PTR/  $3 

wr_tag  C0NST_PTR,  $cut_0 

/* 


*  Put 

64K 

into 

Tl. 

V 

add 

T1 , 

rO, 

$0x800 

sll 

Tl, 

Tl, 

$3 

sll 

Tl, 

Tl, 

$2 

/* 

*  Put 

0x500000 

in  T2  . 

V 

add 

T2 , 

rO, 

$0x500 

sll 

T2 , 

T2 , 

$3 

sll 

T2 , 

T2 , 

$3 

sll 

T2 , 

T2 , 

$3 

sll 

T2 , 

T2 , 

$3 

/* 

*  Put 

8K  in  T3 

V 

add 

T3 , 

rO, 

$1024 

sll 

T3 , 

T3 , 

$3 
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/* 

*  All  data  starts  at  0x500000.  Each  stack  and  such  is  put  at  locations 

*  similar  to  those  used  in  the  PLM  except  that  each  is  multiplied  by  8 

*  since  the  PLM  is  word  addressed  and  SPUR  is  byte  addressed  with  8  bytes 

*  per  word . 

V 


add 

H,  T2,  91024 

add 

E,  T2 ,  T1 

add 

B,  E,  0 

st_32 

B,  CONST_PTR , 

$stack  Joottom 

add 

TR,  E,  T3 

add 

CP ,  rO,  0 

add 

S,  H,  0 

add 

T4,  TR,  T3 

st_32 

T4,  CONST_PTR, 

$PDL_o  f  fset 

st_32 

T4,  CONST_PTR, 

$stack_o  f  fset 

add 

T4,  T2 ,  $128 

st_32 

T4,  CONST_PTR, 

$H2_of fset 

instructions .  h 
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#define  get_stack_Joase  (reg)  \ 

ld_32  reg,  CONST_PTR,  $stackjoottom 


/****************  basics 

#define  deref(reg,  temp) 
rd_tag 
and 

cmpjDr  _de  1  ayed 

rd_tag 

ld_40 

@101 :  rd_tag 

and 

cmp_br _de  1  ayed 

rd_tag 

ld_40 

cmp_br_de 1 ayed 

Nop ;  \ 

jump 

add 

rd_tag 


/ 


@102  : 
@103: 


#define  trail  (reg)  \ 
st_40 
add 


\ 

temp,  reg;  \ 

temp,  temp,  $type_mask;  \ 
neq,  temp,  $var_type,  @103 f;  \ 
temp,  reg;  \ 
reg,  reg,  0;  \ 
temp,  reg;  \ 

temp,  temp,  $type_mask;  \ 

neq,  temp,  $var_type,  @103 f;  \ 

temp,  reg;  \ 

temp,  reg,  0;  \ 

eq,  temp,  reg,  @102 f;  \ 

@101b$w;  \ 
reg,  temp,  0;  \ 
temp,  reg;  \ 


reg,  TR,  0;  \ 
TR,  TR,  $8 


#define  decdr (reg)  \ 

tag_cmpJor_delayed  ne_tag,  reg,  $list_cdr_type,  @101 f;  \ 

add_nt  S,  S,  $8;  \ 

add_nt  S,  reg,  0;  \ 

ld_40  reg,  S,  0;  \ 

jump  @102 f$w;  \ 

add_nt  S,  S,  $8;  \ 

@101:  tag_cmpJor_delayed  ne_tag,  reg,  $nil_const_type ,  @102f;  \ 

Nop ;  \ 

ld_40  reg,  CONST  JPTR,  $nil_offset;  \ 

@102  : 

#define  Pop_ChoicePoint (temp)  \ 


add 
add 
add 
add 
add 

rd_special 

return 

Nop 


OLDJI,  H,  0;  \ 
0LD_E ,  E,  O;  \ 
0LD_TR,  TR,  0;  \ 
0LD_CP ,  CP,  0;  \ 
0LD_BP,  BP,  0;  \ 
temp ,  cpu_pc ;  \ 
temp,  $12;  \ 
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#def ine  wr ite^Nreg  (reg)  \ 

st_32  reg,  CONST_PTR, 

#def ine  read^Nreg  (reg)  \ 

ld_32  reg,  CONST^PTR, 


$N_reg_of fset 
$N_reg_of fset 


#def ine  make_const (reg,  const)  \ 

add  reg,  CONST^PTR,  $const;  \ 

wr__tag  reg,  $const_type 


#def ine  make_nil  (reg)  \ 

ld_40  reg,  CONST_PTR,  $nil_offset 


#def ine  bind (binding,  bound,  temp)  \ 

rd_tag  temp,  bound;  \ 

and  temp,  temp,  $cdr_type; 


cmp_br  _de  1  ayed 

rd„tag 

or 

wr_tag 

@101 :  st_40 

add 

wr_tag 

!  Trail  in  Bind;  \ 


neq,  temp,  $cdr_type, 
temp,  binding;  \ 
temp,  temp,  5cdr_type; 
binding,  temp;  \ 
binding,  bound,  0;  \ 
temp,  bound,  0;  \ 
temp,  0;  \ 


\ 

@101 f;  \ 

\ 


trail (temp) 


#define  call_unify()  \ 
jump 

rd_special 


uni fy$w;  \ 
T3,  cpu_pc 


y<******************* 


allocate  **********************/ 


0;  \ 
ge, 

\ 


#define  allocate  ()  \ 

!  Allocate;  \ 

add  T2 ,  E, 

cmp  _br_de 1 ayed 
Nop ;  \ 

jump  @2f$w; 

add  E ,  B ,  0 ;  \ 

@1 :  read_JNreg  (Tl)  ;  \ 

add 
add 

@2 :  st_40 

st_40 
st_40 
rd_tag 
wr_tag 
st_40 


T2 ,  B,  @lf; 


E,  E,  $env_size;  \ 

E,  E,  Tl;  \ 

CP,  E,  $saved_CP_of fset; 
T2,  E,  $saved_E_of fset ; 
Tl,  E,  $saved^N_of fset; 
Tl,  CONST_PTR;  \ 

B,  Tl;  \ 

B,  E,  $saved_B_o f fset 


\ 


\ 

\ 

\ 


instructions .h 


Page  3 


y* *********************  deallocate 


#define  deallocate  ()  \ 

!  Deallocate;  \ 

ld_40  CP,  E,  $saved_CP_of fset;  \ 
ld_40  Tl,  E,  $saved_N_o  f f set ;  \ 
write_Nreg (Tl) ;  \ 
ld_40  E,  E,  5saved_E_of fset 

/*******************  call  **********************/ 

#define  cal l_proc (label ,  N_value)  \ 

!  Call_proc;  \ 

wr_tag  CONST^PTR,  $cut_0;  \ 

add  Tl,  rO,  $N_value;  \ 

sll  Tl,  Tl,  $3;  \ 

write_Nreg (Tl) ;  \ 
jump  label/* */$w  ;\ 

rd_special  CP,  cpu_pc 

/*******************  cut  **********************/ 


#define  cut  ()  \ 

!  Cut;  \ 

get_stack_base (T2) ;  \ 

@1 :  cmpJor_delayed  eq,  B,  T2,  @3f;  \ 

ld_40  Tl,  E,  $saved_B_o f fset ;  \ 

cmp_br_delayed  eq,  B,  Tl,  (§2f;  \ 

Nop ;  \ 

Pop_ChoicePoint (Tl)  ;  \ 

jump  (§lb$w;  \ 


Nop ;  \ 

<§)2:  tag_cmp_br_delayed  ne_tag,  Tl,  $cut_l. 

Nop ;  \ 

Pop_ChoicePoint (Tl) ;  \ 

@3:  wr_tag  CONST_PTR,  $cut_0;  \ 

st_40  B,  E,  $saved_B_of fset 


<93  f;  \ 


^/* *****************  *  cutd  **********************/ 


#define  cutd  (label)  \ 

!  Cutd;  \ 

@1:  ld_32  Tl,  CONST_PTR ,  $  label;  \ 

cmp_br_delayed  eq,  Tl,  OLD_BP,  @2f;  \ 

Nop ;  \ 

Pop_ChoicePoint (Tl) ;  \ 
jump  (a)lb$w;  \ 

Nop ;  \ 

@2:  Pop_ChoicePoint (Tl) 
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y* *****************  *  call_fail 


#define  call_fail()  \ 

!  Call_fail;  \ 

jump  fail$w;  \ 

Nop ;  \ 


#def ine  escape (fen,  reg)  \ 

!  Escape;  \ 

st_32  r28 ,  CONST_PTR,  $save_r28_of fset ;  \ 

st_32  r26 ,  CONST^PTR,  $save_rlO_of fset;  \ 

st_32  r9 ,  CONST_PTR,  $save_r9_of fset;  \ 

add  r28 ,  rO,  $fcn;  \ 

call  (-71  &  Oxfffffff) ;  \ 

Nop ;  \ 

add  reg,  r28,  0;  \ 

ld_32  r28 ,  C0NST_PTR,  $save_r28_of fset ;  \ 

ld_32  r26 ,  C0NST_PTR,  $save_rlO_of fset ;  \ 

ld_32  r9 ,  C0NST_PTR,  $save_r9_of fset 

/**********************  comparison  ********************/ 


#define  equal  ()  \ 

! Equal;  \ 

add  T1 , 

add  T2 , 

deref  (T1 ,  T3)  ;  \ 
deref  (T2 ,  T4)  ;  \ 
call_unify  ()  ;  \ 
cmpJor_delayed  eq. 
Nop ;  \ 

call_fail();  \ 

@1: 

#define  compare (op)  \ 

!  Compare ;  \ 

deref  (Al,  Tl) ;  \ 
deref (A2,  T2) ;  \ 
emp  _)or_delayed  neq 
Nop ;  \ 

cmp_)Dr_delayed  neq 
Nop ;  \ 

emp  _Jor  _de  1  ayed  op , 
Nop ;  \ 


Al,  0;  \ 

A2 ,  0;  \ 

T4,  $1,  @lf;  \ 


Tl,  $const_num_type ,  <3)1  f ;  \ 
T2,  $const_num_type,  @lf;  \ 
Al,  A2 ,  @2f;  \ 
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@1 :  call_fail  ()  ;  \ 

@2  : 


#define  less_than()  \ 

!  Less_than;  \ 

compare (It) 

#def ine  less_than_or_equal  ()  \ 

!  Less_than_or_equal ;  \ 
compare (le) 

tdefine  greater_than  ()  \ 

!  Greater_than ;  \ 

compare (gt) 

#define  greater_than_or_equal  ()  \ 

!  Greater_than_or_equal ;  \ 
compare (ge) 

/*************,*******  is  ****************************/ 


#define  is_escape()  \ 

|  Is  ;  \ 

deref(Al,  Tl)  ;  \ 
deref(A2,  T2)  ;  \ 
deref(A4,  T3)  ;  \ 

and  Tl ,  Tl,  $type_mask;  \ 

cmpjDr_delayed  eq,  Tl,  $var_type,  @lf;  \ 

Nop;  \ 

cmpJor_delayed  neq,  Tl,  $const_type,  @2f;  \ 

Nop ;  \ 

(5H ;  tag_cmp_br_delayed  ne_tag,  A2 ,  $const_num_type,  @2f;  \ 

rd_tag  T4,  A3;  \ 

t ag_cmp_br _de  1  ay ed  ne_tag,  A4,  $const_num_type,  @2f,  \ 

and  T4,  T4,  $const_type;  \ 

cmp Jor_de 1 ayed  neq,  T4,  $const_type,  @2f;  \ 

Nop;  \ 

escape  (ARITH,  T4)  ;  \ 

t ag_cmp_Jor _de  1  ayed  ne_tag,  Al,  $const_num_type ,  @3f,  \ 

Nop ;  \ 

cmp_)or_delayed  eq,  Al,  T4,  @3f;  \ 

Nop ;  \ 

@2:  call_fail  ()  ;  \ 

@3 :  cmpJor_delayed  neq,  Tl,  $var_type,  @4f;  \ 

Nop ;  \ 

wr_tag  T4,  $const_num_type ;  \ 

bind  (T4,  Al,  Tl)  ;  \ 


(514: 
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#define  is_2_escape ()  \ 

!  Is_2 ;  \ 

jump  is_2$w;  \ 

rd_special  T9,  cpu_pc 

***********************  esc_call 


/*  ....... 

*  Instead  of  providing  the  escape  call  routine  I  provide  two  primitives 

*  instead.  The  first  shifts  up  all  of  the  argument  registers. 

*  The  second  does  the  jump.  All  escape  calls  have  to  be  translated  by 

*  hand  to  use  these  primitives. 

V 

#def  ine  esc_shift_regs  ()  \ 

! Esc_shi f t_regs ;  \ 


add 

Al, 

A2 , 

0; 

\ 

add 

A2, 

A3, 

0; 

\ 

add 

A3, 

A4, 

0; 

\ 

add 

A4, 

A5 , 

0; 

\ 

add 

AS, 

A6 , 

0; 

\ 

add 

A6 , 

A7 , 

0; 

\ 

add 

A7, 

A8, 

0 

#define  esc_jump (label)  \ 

!Esc_esc_jump;  \ 

jump  label/* */$v;  \ 

rd_special  CP.  cpu_pc 

/************************  var  *************************/ 


#define  var_escape()  \ 

!  Var_escape;  \ 

add  T2 ,  Al,  0;  \ 

deref  (T2 ,  Tl)  ;  \ 

and  Tl,  Tl,  $type_jiask;  \ 

cmp_br_delayed  eq,  Tl,  $var_type,  @lf;  \ 

Nop ;  \ 

call_fail();  \ 

(511: 

/*************************  setter  **********************/ 


#define  setter  ()  \ 

!  Setter ;  \ 

add  T2 ,  Al,  0;  \ 

add  Tl,  A2 ,  0;  \ 

deref  (T2,  T3)  ;  \ 
deref  (Tl,  T3)  ;  \ 
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tag_cmp _£r_de layed 
Nop ;  \ 

call_fail();  \ 

@1:  cmp_ibr_de  layed 

Nop ;  \ 
add 

wr_tag 
@2 :  wr_tag 

st_40 
add 
ld_32 
jump 

rd_special 
st_32 
add 


eq_tag,  Tl,  $const_num_type<  @lf;  \ 


le,  Tl,  $15,  @2f;  \ 


Tl,  rO,  0;  \ 

Tl,  0;  \ 

Tl,  $var_type;  \ 

Tl,  Tl,  0;  \ 

T8,  H,  0;  \ 

H,  r9 ,  $H2_of fset;  \ 
esc_unify$w;  \ 

T3 ,  cpu_pc ;  \ 

H,  r9 ,  $H2_of fset ;  \ 
H,  T8,  0;  \ 


03: 

/*************************  access  **********************/ 

#define  access  ()  \ 

!  Access ;  \ 

add  T2 ,  Al,  0,'  \ 

add  Tl ,  A2 ,  0 ;  \ 

deref  (T2 ,  T3)  ;  \ 
deref (Tl,  T3)  ;  \ 
tag_cmp_Jor  „de  1  ayed 
Nop;  \ 

call_fail  ()  ;  \ 

@1:  cmp_Jor_de  layed 

Nop ;  \ 

add  Tl ,  rO ,  0 ;  \ 

wr_tag  Tl,  0;  \ 

@2:  ld_40  Tl,  Tl,  0;  \ 

call_unify  ()  ;  \ 

cmp_Jor._de layed  neq,  T4,  0,  03  f;  \ 

Nop;  \ 

call_fail();  \ 

03: 

/******************  integer  **********************/ 


eq_tag,  Tl,  $const_num_type,  01 f;  \ 
le,  Tl,  $15,  02 f;  \ 


tdefine  integer  ()  \ 

!  Integer;  \ 

add  Tl ,  Al ,  0;  \ 

deref  (Tl,  T2)  ;  \  . 

tag_cmpJor_de layed  eq_tag,  Tl,  $const_num_type,  Olf;  \ 

Nop ;  \ 

call_fail  ()  ;  \ 

01 : 
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/*******************  execute  *************************/ 

#define  execute (label)  \ 

!  Execute;  \ 

jump  label/* */$w ; \ 

Nop 

/******************  get_variable_var  ********************/ 

#define  get__variable_reg  (An,  Ai)  \ 
add  An ,  Ai ,  0 

#define  get_variable_var (env_offset,  Ai)  \ 
st_40  Ai,  E,  $env_offset 

/******************  get_constant  **********************/ 

#define  get_constant_str ing (const ,  Ai)  \ 

!  Get_constant_string;  \ 

add_nt  T2,  Ai,  0;  \ 

deref  (T2 ,  T3)  ;  \ 
make_const  (T1 ,  const);  \ 
call_unify  ()  ;  \ 

cmp_br_delayed  eq,  T4,  $1,  @lf;  \ 

Nop;  \ 

jump  fail$w;  \ 

Nop ;  \ 

@1: 

#define  get_constant_number (const ,  Ai)  \ 

!  Get_constant_number ;  \ 

add_nt  T2,  Ai,  0;  \ 

deref  (T2,  T3)  ;  \ 

ld_32  Tl,  CONST.J’TR,  $const;  \ 

wr_tag  Tl,  $const_num_type ;  \ 

call_unify  ()  ;  \ 

crop _br_delayed  eq,  T4,  $1,  (§lf;  \ 

Nop ;  \ 

jump  fail$w;  \ 

Nop ;  \ 

(all  : 

/*******************  get_list  ***********************/ 

#define  get_list(Ai)  \ 

!  Get_list;  \ 

add  T2,  Ai,  0;\ 

deref  (T2,  T3);\ 
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@1: 


!  Trail 


(512: 


@3: 

@4: 


st_32  rO,  CONST_PTR,  $smode_of fset;  \ 

and  T3,  T3,  $type_mask;\ 

cmp_br_delayed  neq,  T3,  $list_type,  @lf;\ 

Nop ;  \ 

add  S,  T2,  0;\ 


jump  @4f$w;\ 

wr_tag  S,  $read_mode;\ 

tag_cmp_br_delayed  ne_tag,  T2, 


sub 

cmp_br _de 1 ayed 

wr_tag 

sub 

in  Get_list;  \ 
trail  (T4)  ;\ 


T4,  H,  $8 ; \ 
neq,  T2,  T4,  @2f;\ 
T4,  $cdr_type;\ 

H,  H,  $8 ; \ 


jump 

wr_tag 

cmp  _br_delayed 
add 

wr_tag 

call_unify  ()  ;\ 


<§4f$v;\ 

S,  $write_mode;\ 
neq,  T3,  $var_type, 
Tl,  H,  0;  \ 

Tl,  $list_type;\ 


jump 
wr_tag 
jump 
Nop ;  \ 


@4f$v;\ 

S,  $write_mode;\ 
fail$w;\ 


$unbound_var_type , 


@3f;\ 


®2f;\ 


/******************  get_nil  **********************/ 

#define  get_nil (Ai)  \ 

!  Get_nil;  \ 

add  Tl,  Ai,  0;  \ 

deref (Tl ,  T2) ;  \ 

ld_40  T2 ,  CONST _PTR,  $nil_offset;  \ 

wr_tag  T2,  $const_type;  \ 

call_unify  ()  ;  \ 

cmp Jor_del  ayed  eq,  T4,  $1,  (5)1  f;  \ 

Nop;  \ 

jump  fail$w;  \ 

Nop ;  \ 

@1 : 


/******************  get_structure  **********************/ 

#define  get_structure (struct ,  Ai)  \ 

!  Get_structure;  \ 

add  Tl,  Ai,  0;  \ 

deref  (Tl,  T3)  ;  \ 
make_const (T2 ,  struct) ;  \ 
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add  T4,  rO,  $1;  \ 

st_32  T4,  CONST_PTR ,  $smode_of fset;  \ 

and  T3,  T3,  $type_jnask;  \ 

cmp  Jor  _de 1 ayed  neq,  T3,  $var_type,  @lf;  \ 

Nop ;  \ 

st_40  T2,  H,  0;  \ 

add  T2.  H,  0;  \ 

wr_tag  T2,  $struct_type;  \ 

add  H,  H,  $8;  \ 

st_40  T2 ,  Tl,  0;  \ 

wr_tag  Tl,  0;  \ 

!  Trail  in  Get_structure;  \ 
trail  (Tl) ;  \ 

jump  @3f$w;  \ 

wr_tag  S,  $write_mode;  \ 

@1:  cmp  Jor _de 1 ayed  neq,  T3,  $struct_type ,  @2f;  \ 

ld_40  T3,  Tl,  0;  \ 

cmpJor_delayed  neq,  T3,  T2 ,  @2f;  \ 

Nop ;  \ 

S,  Tl,  $8;  \ 

(§l3f$w;  \ 

S ,  $read_mode ;  \ 


@2 

@3 


add 

jump 

wr_tag 

call_fail();  \ 


y*******************  get_value  *********************/ 


tdefine  get_value_reg (An,  Ai)  \ 

!  Get_value_reg;  \ 

add_nt  T2 ,  An,  0;  \ 

deref  (T2 ,  T3)  ;  \ 

add_nt  Tl ,  Ai ,  0 ;  \ 

deref  (Tl,  T3)  ;  \ 

add  T9,  Tl,  0;  \ 

call_unify  ()  ;  \ 

cmpJor_delayed  eq,  T4,  $1,  @lf;  \ 
Nop ;  \ 

jump  fail$w;  \ 

Nop ;  \ 

@1 :  add_nt  An,  T9,  0;  \ 


#define  get_value_var (env_o f fset ,  Ai)  \ 

!  Get_value_var ;  \ 

ld_40  T2,  E,  $env_offset;  \ 

deref (T2,  T3)  ;  \ 

add_nt  Tl ,  Ai ,  0 ;  \ 

deref  (Tl,  T3)  ;  \ 

call_unify  ()  ;  \ 

cmp Jor_delayed  eq,  T4,  $1,  @lf;  \ 
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Nop;  \ 

jump  fail$w;  \ 

Nop ;  \ 

@1: 

/*******************  mark  **********************/ 

/* 

*  An  instruction  that  isn't  used  anymore? 

V 

#define  mark()  \ 

!  Mark;  \ 

Nop 

/*******************  pause  **********************/ 

/* 

*  An  instruction  that  isn't  used  anymore? 

V 

#define  pause  ()  \ 

!  Pause;  \ 

Nop 

y-* ***************  procedure  *********************/ 

#define  procedure (name)  \ 

!  Procedure;  \ 
name : 

/****************  proceed  *********************/ 

tdefine  proceed  ()  \ 

!  Proceed;  \ 

cmp_hr_delayed  eq,  CP,  0,  @lf;  \ 

jump_reg  CP,  $4;  \ 

wr_tag  CONST_PTR,  $cut_0;  \ 

@1 :  jump  success$w;  \ 

Nop 

/* ******************  put_constant  **********************/ 

#define  put_constant_str ing (const ,  Ai)  \ 

!  Put_constant_string;  \ 

make_const (Ai,  const) 

tdefine  put _constant_number  (const ,  Ai)  \ 

!  Put_constant_number ;  \ 

ld_32  Ai,  CONST_PTR,  $const;  \ 

wr_tag  Ai ,  $const_num_type 
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/*****************  put_list  *******************/ 


#define  put_list(Ai)  \ 
!  Put_list;  \ 
wr_tag 
st_32 
add_nt 
wr_tag 


S,  $write_mode;  \ 

rO,  CONST^PTR,  $smode_of fset ;  \ 

Ai,  H,  0;  \ 

Ai,  $list_type 


/* 


★★★★★★★★★★★★★★★★★ 


put_nil 


#define  put_nil  (Ai)  \ 

!  Put_nil;  \ 

make_nil  (Ai)  ;  \ 

wr_tag  Ai,  $const_type 


/* ******************  put_structure 


********************** / 


#define  put_structure (struct ,  Ai)  \ 

!  Put_structure ;  \ 

add  T4,  rO,  $1;  \ 

st_32  T4,  CONSTJPTR,  $smode_of  fset ;  \ 

make_const (Tl,  struct);  \ 

add  Ai,  H,  0;  \ 

wr_tag  Ai,  $struct_type ;  \ 

st_40  Tl,  H,  0;  \ 

add  H,  H,  $8;  \ 

wr_tag  S,  $write_mode 


^/*******************  pu.t_unsa fe_value 


********************** 


/ 


#define  put_unsafe_value (env_of fset ,  Ai)  \ 


!  Put_unsafe_value;  \ 
ld_40 

deref (Tl ,  T2)  ; 
and 

cmp_)or  _de  1  ayed 
Nop ;  \ 

cmp  _Jbr_delayed 

Nop ;  \ 

add 

wr_tag 

st_40 

add 

(511 :  add_nt 


Tl,  E,  $env_offset;  \ 

\ 

T2 ,  T2 ,  $var_type;  \ 
neq,  T2,  $var_type,  @lf; 

le,  Tl,  E,  (531  f ;  \ 

Tl,  H,  0;  \ 

Tl ,  $var_type ;  \ 

Tl,  H,  0;  \ 

H,  H,  $8;  \ 

Ai,  Tl,  0 


\ 


^/***************  put_value  *******************/ 
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#def ine  put_value_var (env_of fset ,  Ai)  \ 

!  Put_value_var ;  \ 

ld_40  Ai,  E,  $env_o  f fset 

#def ine  put_value_reg (An ,  Ai)  \ 

!  Put_value_reg;  \ 

add_nt  Ai,  An,  0 


put_variable  ************************/ 


#def ine  put_variable_var (env_of fset,  Ai)  \ 

!  Put_variable_var ;  \ 

add_nt  Ai,  E,  $env_offset;  \ 

wr_tag  Ai,  $var_type;  \ 

st_40  Ai,  E,  $env_of fset 

tdefine  put_variable_reg (An,  Ai)  \ 

!  Put_variable_reg;  \ 

add_nt  Ai,  H,  0;  \ 

wr_tag  Ai,  $var_type;  \ 

add_nt  An ,  Ai ,  0 ;  \ 

st_40  Ai,  H,  0;  \ 

add__nt  H,  H,  $8 

/**********************  quit  ********************/ 


#define  quit()  \ 

!  Quit;  \ 

add 

jump 

tdefine  end()  quit  () 


r28 ,  rO,  0;  \ 
start$w 


y* *****************  *  retry  **********************/ 


tdefine  retry (label)  \ 
!  Retry ;  \ 

wr_tag 

ld_32 

rd_special 

jump_reg 

add 


CONST_PTR,  $cut_l;  \ 

Tl,  CONST^PTR,  $  label;  \ 
T2 ,  cpu_pc ;  \ 

Tl,  0;  \ 

0LD_BP ,  T2 ,  $12 


/******************* 


retry _jne_else  **********************/ 


tdefine  retry_jne_else  (label)  \ 

!  Retry_jne_else;  \ 

ld_32  0LD_BP ,  C0NST_PTR,  $label;  \ 

wr_tag  CONST^PTR,  $cut_l 
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/ 


******************* 


switch_on_constant 


#define  switch_on_constant (mask,  label)  \ 
!  Switch_on_constant;  \ 


deref(Al,  T3)  ; 
and 

cmp_br _de  1  ay  ed 

ld_32 

add 

and 

@1 :  cmp_br_delayed 

ld_32 
add 

cmp_br_delayed 

Nop;  \ 

@2 :  sub 

jump 
add 

@3:  ld_32 

cmp_Jor  _de  1  ayed 
ld_32 
jump_reg 
Nop ;  \ 

04:  call_fail() 


T1 ,  T3,  $type_mask;  \ 
neq,  Tl,  $const_type,  @4f; 
T1 ,  CONST_PTR,  $mask;  \ 

T2,  CONST^PTR,  $ label;  \ 
T3,  T3,  $const_num_type;  \ 
le,  Tl,  0,  @4f;  \ 

T4,  T2,  0;  \ 

T4,  T4,  CONST_PTR;  \ 
eq,  Al,  T4,  @3f;  \ 

Tl,  Tl,  $2;  \ 

01b$w;  \ 

T2 ,  T2 ,  $12;  \ 

T4,  T2 ,  $4;  \ 
neq,  T3,  T4,  @2b;  \ 

T4,  T2 ,  $8;  \ 

T4,  0;  \ 


\ 


/*  ★  ★  ★ 


*  ★ 


************* 


switch_on_structure 


#def ine  switch_on_structure (mask,  label)  \ 

!  Switch_on_structure;  \ 
deref(Al,  Tl)  ;  \ 

and  Tl,  Tl,  $type_jnask;  \ 


cmpjor  _de  1  ayed 

ld_32 

add 

ld_40 

01:  cmp_br_delayed 

ld_32 
add 

cmp  _br_delayed 
sub 


neq,  Tl,  $struct_type, 
Tl,  CONST_PTR,  $mask ; 
T2 ,  C0NST_PTR,  $  label; 
T4,  Al,  0;  \ 
le,  Tl,  0,  @3f;  \ 

T3 ,  T2 ,  0;  \ 

T3 ,  T3 ,  CONST JPTR;  \ 
eq,  T4,  T3,  @2f;  \ 

Tl,  Tl,  $2;  \ 


@3f; 

\ 

\ 


jump 

add 

(512 :  ld_32 

jump_reg 
Nop;  \ 

03:  call_fail() 


01b$w;  \ 

T2 ,  T2 ,  $8;  \ 
T4,  T2 ,  $4;  \ 
T4,  0;  \ 


\ 
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^/*  ******************  switch_on_term  *********************/ 

tdefine  switch_on_term (const_label , list_label , struct_label)  \ 
!  Switch_on_term;  \ 

add_nt  Tl,  Al,  0;  \ 

deref (Tl,  T2)  ;  \ 

and  T2,  T2,  $type_mask;  \ 

cmp_Jbr_delayed  neq,  T2,  $const_type,  @lf;  \ 

Nop;  \ 

jump  const_label/* */$w;  \ 

Nop ;  \ 

@1:  cmp_br_delayed  neq,  T2,  $list_type,  @2f;  \ 

Nop ;  \ 

jump  list_label/**/$w;  \ 

Nop;  \ 

@2:  cmp Jor _de 1 ayed  neq,  T2 ,  $struct_type,  @3f;  \ 

Nop ;  \ 

jump  struct_label/* */$w;  \ 

Nop ;  \ 

.  @3: 

/*******************  trust  **********************/ 

tdefine  trust  (label)  \ 

!  Trust;  \ 

Pop_ChoicePoint (Tl) ;  \ 
wr_tag  C0NST_J>TR,  $cut_0;  \ 

ld_32  Tl,  CONSTJPTR,  $label;  \ 

jump_reg  Tl,  $0;  \ 

Nop 


/* *******************  trust_me_else  **********************/ 

tdefine  trust_jne_else  (label)  \ 

!  Trust_me_else;  \ 

Pop_ChoicePoint (Tl)  ;  \ 

wr_tag  C0NST_PTR,  $cut_0 

/*******************  try  **********************/ 

tdefine  try  (label)  \ 

!  Try ;  \ 


add_nt 

SAVE_AX1 ,  Al, 

0; 

\ 

add_nt 

SAVE _AX2  ,  A2  , 

0; 

\ 

add_nt 

SAVE_AX3 ,  A3, 

0; 

\ 

add_nt 

SAVE _AX4 ,  A4 , 

0; 

\ 

add_nt 

SAVE _^AX5 ,  A5 , 

0; 

\ 

add_nt 

SAVE _AX6  ,  A6  , 

0; 

\ 
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add_nt  SAVE_AX7 ,  A7 ,  0 ;  \ 

add_nt  SAVE_AX8,  AS,  0;  \ 

read _Nreg (SAVE_N) ;  \ 


st_32 

st_32 

call 

Nop ;  \ 

ld_32 

ld_32 

add_nt 

add_nt 

add_nt 

add_nt 

add_nt 

add 

cmp_Jor  _de  1  ayed 
Nop ;  \ 

read_Nreg (Tl)  ; 

add 

add 

add 

wr_tag 

ld_32 

rd_special 

jump_reg 

add 


r26,  C0NST_PTR,  $save_rlO_of fset;  \ 
r9 ,  C0NST_PTR,  $save_r9_of fset ;  \ 
(§)lf$w;  \ 

rlO,  CONST _PTR ,  $save_rlO_of fset ;  \ 
r9 ,  C0NST_PTR,  $save_r9_of fset;  \ 

E ,  0LD_E,  0;  \ 

TR,  0LD_TR,  0;  \ 

H,  OLD_H,  0;  \ 

CP,  OLD_CP ,  0;  \ 

BP,  OLD_BP ,  0;  \ 

B,  OLD_B,  0;  \ 

It,  E,  B,  @lf;  \ 


B,  Tl,  E;  \ 

B,  B,  $env_size;  \ 

B,  B,  $4;  \ 

CONST JPTR,  $cut_l ;  \ 

Tl,  CONST_PTR ,  $ label;  \ 
T2,  cpu_pc;  \ 

Tl,  O;  \ 

OLD_BP ,  T2 ,  $12 


/* **********************  try_jne_else 

#define  try_me_else (label)  \ 

!  Try _me_e 1 s  e ;  \ 


add_nt 

SAVE_AX1 ,  Al,  0;  \ 

add_nt 

SAVE_AX2 ,  A2 ,  0;  \ 

add_nt 

SAVE_AX3,  A3,  0;  \ 

add_nt 

SAVE _AX4 ,  A4,  0;  \ 

add_nt 

SAVE_AX5 ,  A5 ,  0;  \ 

add_nt 

SAVE_AX6 ,  A6 ,  0;  \ 

add_nt 

SAVE_AX7,  A7 ,  0;  \ 

add_nt 

SAVE_AX8 ,  A8 ,  0;  \ 

readJMreg (SAVE_N) ;  \ 

st_32 

r 26 ,  CONST_PTR ,  $save_r 

st_32 

r9 ,  CONST _PTR ,  $save_r9 

call 

@lf$w;  \ 

Nop ;  \ 
ld_32 

rlO,  CONST_PTR,  $save_r 

ld_32 

r9,  CONST _PTR,  $save_r9 

add_nt 

E,  OLD_E ,  0;  \ 

add_nt 

TR,  OLD_TR,  0;  \ 

add_nt 

H,  0LDJ1,  0;  \ 
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add_nt  CP ,  OLD_CP ,  0 ;  \ 

add_nt  BP,  OLD_BP,  0;  \ 

add  B ,  OLD_B ,  0 ;  \ 

cmp_br_delayed  It,  E,  B,  @lf;  \ 
Nop ;  \ 


@1: 


read_Nreg (Tl) 

add 

add 

add 

ld_32 

wr__tag 


\ 


B,  Tl,  E;  \ 

B,  B,  $env_size;  \ 

B,  B,  $4;  \ 

OLD_BP,  CONST_PTR,  $  label;  \ 
CONST^PTR,  $cut _ 1 


#define  unify _cdr_reg (An)  \ 
!  Unify_cdr_reg;  \ 

t  ag_cmp_br  _de  1  ay  ed 


eq_tag,  S,  $read_jmode,  @lf;  \ 


@ 1 : 


(512 


@3: 


add  Tl,  H,  0;  \ 

wr_tag  Tl,  $unbound_var_type;  \ 

st_40  Tl,  H,  0;  \ 

add„nt  An,  H,  0;  \ 

wr_tag  An,  $var_type;  \ 

jump  @3f$w;  \ 

add_nt  H,  H,  $8;  \ 

ld_40  An,  S,  0;  \ 

rd_tag  T2,  An;  \ 

and  T2 ,  T2,  $cdr_type;  \ 

cmp_br _de 1 ayed  eq,  T2 ,  $cdr_type,  @2f;  \ 
Nop;  \ 

add_nt  An,  S,  0;  \ 

jump  (§13f$w;  \ 

wr_tag  An,  $list_type;  \ 

rd_tag  T2,  An;  \ 

and  T2 ,  T2 ,  $not_cdr_type;  \ 

wr_tag  An ,  T2 ;  \ 


#def ine  unify_cdr_var  (env_of  fset) 
!  Unify_cdr_var ;  \ 

t ag_cmp_br_de 1 ayed 


\ 


@1 : 


add 

wr_tag 

st_40 

wr_tag 

st_40 

jump 

add_nt 

ld_40 

rd_tag 


eq_tag, 

Tl,  H,  0;  \ 

Tl,  $unbound_var_type ; 
Tl,  H,  0;  \ 

Tl,  $var_type;  \ 

Tl,  E,  $env_offset;  \ 

@3f$v;  \ 

H,  H,  $8;  \ 

Tl,  S,  0;  \ 

T2 ,  Tl;  \ 


S,  $read_mode,  @lf;  \ 


\ 
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@ 2 


@3; 


and 

cmp  _jor_delayed 

Nop ;  \ 

add_nt 

wr_tag 

jump 

st_40 

rd_tag 

and 

vrr_tag 

st_40 


T2 ,  T2 ,  $cdr_type;  \ 
eq,  T2,  $cdr_type,  @2f;  \ 

Tl,  S,  0;  \ 

Tl,  $list_type;  \ 

@3f$v;  \ 

Tl,  E,  $env_offset;  \ 

T2,  Tl;  \ 

T2,  T2 ,  $not_cdr_type;  \ 
Tl,  T2 ;  \ 

Tl,  E,  $env_of fset ;  \ 


*********************  uni fy_constant  ********************/ 

#def ine  unify_constant_str ing (const)  \ 

!  Unify_constant_string;  \ 

make_const (T8,  const);  \ 

jump  unify_const_func$w;  \ 

rd_special  T9,  cpu_pc 


#def ine  uni fy_constant_number  (const)  \ 

!  Unify_constant_number ;  \ 

ld_32  T8 ,  C0NST_PTR,  $const;  \ 

wr_tag  T8,  $const_num_type;  \ 

jump  uni fy_const_func$w ;  \ 

rd_special  T9,  cpu_pc 

/**************************  unify_nil  ****************************/ 


#def ine  unify _nil()  \ 

!  Unify_nil;  \  x 

tag_cmp_jor_delayed  eq_tag,  S,  $write_mode,  @2f,  \ 

Nop ;  \ 

ld_40  Tl,  S,  0;  \ 

rd_tag  T2,  Tl;  \ 

and  T2 ,  T2,  $cdr_type;  \ 

cmpJor_delayed  neq,  T2 ,  $cdr_type,  @lf;  \ 

make_nil (T2) ;  \ 

call_unify  ()  ;  \ 

cmp_Jor_delayed  neq,  T4,  0,  @3f;  \ 

Nop ;  \ 

@1 :  jump  fail$w;  \ 

Nop ;  \ 

@2:  make_nil (Tl) ;  \ 

st_40  Tl,  H,  0;  \ 

add  H,  H,  $8;  \ 


@3: 
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***************  unify_value 


#define  unify_unsafe_value_reg (An)  unify_value_reg (An) 

#define  unify_unsafe_value_var (env_offset)  unify_value_var (env_of fset) 


#define  unify_value_reg  (An)  \ 

!  Unify_value_reg;  \ 

add  T8,  An,  0;  \ 

jump  unify_value$w;  \ 

rd_special  T9,  cpu_pc 


#def ine  uni fy_value_var  (env„of  fset)  \ 

!  Unify_value_var ;  \ 

ld_40  T8 ,  E,  $env_offset;  \ 

jump  unify _ value$w;  \ 

rd_special  T9,  cpu_pc 


/********************  uni fy_var  iable 


*********************/ 


#define  uni fy_var iable  ()  \ 

tag_cmpJor_de layed  eq_tag,  S,  $write_mode,  @2f;  \ 

Nop;  \ 

ld_40  Tl,  S,  0  ;  \ 

decdr  (Tl) ;  \ 

rd_tag  T3,  Tl;  \ 

and  T4,  T3,  $cdr_type;  \ 

cmpJor_.de  layed  neq,  T4,  $cdr_type,  03  f;  \ 

Nop;  \  .  . 

tag_cmpJor„de layed  eq_tag,  Tl,  $unbound_var_type ,  @lf;  \ 

Nop ;  \ 

jump  fail$w;  \ 

Nop ;  \ 

@1 :  add  T2,  H,  0;  \ 

wr_tag  T2 ,  $list_cdr_type;  \ 

call_unify  ()  ;  \ 

wr_tag  S,  $write_jnode;  \ 

@2:  add  Tl,  H,  0;  \ 

wr_tag  Tl,  $var_type;  \ 

st_40  Tl,  H,  0;  \ 

add  H,  H,  $8 


#def ine  unify_variable_reg (An)  \ 

!  Unify_variable_reg;  \ 

uni  fy ..variable  ()  ;  \ 

@3:  add  An,  Tl,  0 

#define  unify _var iable_var (env.of fset)  \ 
!  Unify_variable_var ;  \ 

uni fy_var iable  ()  ;  \ 
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@3:  st_40  Tl,  E,  $env_offset 

y* *****************  *  unify _ void  **********************/ 


#define  unify_void  (num)  \ 

!  Unify_void;  \ 

add  T5,  rO,  $num;  \ 

tag_cmp_br _de  1  ayed  eq_tag,  S,  $write_mode. 


@1 
@ 2 


ld_32 
cmp  Jor_de  1  ayed 
add 
sll 
jump 
add 
add 

cmp_br_de layed 
Nop ;  \ 
ld_40 


Tl,  E,  $smode_of fset ;  \ 
eq,  Tl,  $0,  @lf;  \ 

Tl,  rO,  $num;  \ 

Tl,  Tl,  $3;  \ 

@7f$w;  \ 

S,  S,  Tl;  \ 

Tl,  rO,  0;  \ 
ge,  Tl,  T5 ,  @7f;  \ 

T2 ,  S,  0;  \ 


@3: 


@4: 

@5: 

@6: 


decdr(T2);  \ 

rd_tag 

and 

cmp_br_de  1  ayed 

rd_tag 

and 

cmp_br  _de  1  ayed 
Nop;  \ 

call_fail  ()  ;  \ 

wr_tag 

sub 

add 

wr_tag 

jump 

st_40 

jump 

add 

add 

cmp_Jor_de  1  ayed 
add 

wr_tag 

st_40 

add 

jump 

add 


T3,  T2 ;  \ 

T3,  T3,  9cdr_type;  \ 
neq,  T3,  $cdr_type,  @4f;  \ 
T4,  T2 ;  \ 

T4,  T4,  $var_type;  \ 
eq,  T4,  $var_type,  @3f;  \ 


S,  $vrritejnode;  \ 

T5,  T5 ,  Tl;  \ 

Tl,  H,  0;  \ 

Tl,  $list_cdr_type;  \ 
@5f$w;  \ 

Tl,  T2 ,  0;  \ 

(512b$w;  \ 

Tl,  Tl,  $1;  \ 

Tl,  rO,  0;  \ 

ge,  Tl,  T5 ,  @7f;  \ 

T2 ,  H,  0;  \ 

T2,  $var_type;  \ 

T2 ,  H,  0;  \ 

H,  H,  $8;  \ 

(516b$w;  \ 

Tl,  Tl,  91;  \ 


@7  : 


@5f;  \ 
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#def ine  CONST^PTR  r2 
#define  A1  r3 
tdefine  XI  r3 
tdefine  A2  r 4 
#define  X2  r4 
#define  A3  r5 
#define  X3  r5 
#define  A4  r6 
tdefine  X4  r6 
tdefine  A5  r7 
tdefine  X5  r7 
tdefine  A6  r8 
tdefine  X6  r8 
tdefine  A7  r 9 
tdefine  X7  r9 
tdefine  A8  rl 
tdefine  X8  rl 

tdefine  OLD_BP  rlO 
tdefine  OLD__E  rll 
tdefine  OLD_TR  rl2 
tdefine  OLD_H  rl3 
tdefine  OLD_B  rl4 
tdefine  OLD_CP  rl5 

tdefine  S  rl6 
tdefine  SAVE_AX1  rl6 
tdefine  T1  rl7 
tdefine  SAVE^X2  rl7 
tdefine  T2  rl8 
tdefine  SAVE_AX3  rl8 
tdefine  T3  rl9 
tdefine  SAVE^X4  rl9 
tdefine  T4  r20 
tdefine  SAVE_AX5  r20 
tdefine  T5  r21 
tdefine  SAVE_A^6  r21 
tdefine  T6  r22 
tdefine  SAVE_AX7  r22 
tdefine  T7  r23 
tdefine  SAVE_AX8  r23 
tdefine  T8  r24 
tdefine  SAVE_,N  r24 
tdefine  T9  r25 

tdefine  BP  r26 
tdefine  E  r27 
tdefine  TR  r28 
tdefine  H  r29 
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tdefine  B  f 30 
tdefine  CP  r 31 

#define  type_mask  0x03 
#def ine  list_type  0x00 
#define  struct_type  0x01 
#define  var_type  0x02 
#define  const_type  0x03 
#define  unbound  0x10 
#def ine  bound_var_type  0x00 
#def ine  unbound_var_type  0x12 
#define  cdr_type  0x10 
tdefine  not_cdr_type  0x0 f 
#def ine  list_cdr_type  0x10 
#define  nil_type  0x10 
#def ine  nil_const_type  0x13 
#define  num_type  0x08 
tdefine  const_num_type  OxOb 
tdefine  cut_0  0x00 
tdefine  cut_l  0x01 
tdefine  read_mode  0x00 
tdefine  write_mode  0x01 


0 

8 

16 

24 

32 

32 

40 

48 

56 

64 

72 

80 

88 

96 

104 


tdefine  saved_E_of fset 

tdefine  saved_CP_of fset 

tdefine  saved_B_of fset 

tdefine  saved_N_of fset 

tdefine  env_size 

tdefine  Y1 

tdefine  Y2 

tdefine  Y3 

tdefine  Y4 

tdefine  Y5 

tdefine  Y6 

tdefine  Y7 

tdefine  Y8 

tdefine  Y9 

tdefine  Y10 


tdefine  nil_offset  0 

tdefine  stack_offset  8 

tdefine  save_r 28_o f fset  12 

tdefine  save_rlO_of fset  16 

tdefine  save_r9_of fset  20 

tdefine  N_reg_offset  24 

tdefine  smode_offset  28 

tdefine  H2_offset  32 

tdefine  PDL_offset  36 

tdefine  stack Joottom  40 
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#define  first_const_of fset 

#define  WRITE  O 
#define  NL  1 
#define  ARITH  2 


44 
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r 


v 


#include  "funcs.a" 


/ 


*  Initialize  the  constant  table: 


*  1)  Nil  pointer 

*  2)  Pointer  to  stack  for  recursive  unify  (32  bits  long) 

V 

org  OxleOOO 
long  Oxffffffff 
long  nil_const_type 
.  long  0 

.long  0 
.  long  0 

.  long  0 

.  long  0 

.  long  0 

.long  0 
.  long  0 

.  long  0 
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#include  "defs.h" 

fail 

fail  : 

get_stack_base (Tl) 

cmp_br_delayed  le ,  B,  Tl,  doabort 


unbind_loop : 

cmpJor_delayed  le,  TR,  OLD_TR,  trail_empty 


Nop 

sub 

TR,  TR,  $8 

ld_40 

Tl,  TR,  0 

rd_tag 

T2 ,  Tl 

and 

T2 ,  T2 ,  $cdr_type 

cmp_br _de 1 ay ed 

neq,  T2,  $cdr_type,  unbindl 

add 

T3,  rO,  $var_type 

jump 

unbind2$w 

add 

T3,  rO,  $unbound_var_type 

unbindl 

ld_40 

T2 ,  Tl,  0 

rd_tag 

T2 ,  T2 

and 

T2 ,  T2 ,  $cdr_type 

cmp_br _de 1 ayed 

Nop 

add 

neq,  T2,  $cdr_type,  unbind2 

T3,  rO,  $unbound_var_type 

unbind2 

add 

T4,  Tl,  0 

wr_tag 

T4,  T3 

st_40 

T4,  Tl,  0 

jump 

unbind_loop$w 

Nop 

trail_empty : 

add 

E,  OLD_E,  0 

add 

CP,  OLD_CP ,  0 

add 

H,  OL DJI,  0 

rd_special 

Tl ,  cpu_pc 

return 

Tl,  $12 

Nop 

add 

Al,  SAVE^AXl,  0 

add 

A2 ,  SAVE_AX2 ,  0 

add 

A3,  SAVE_AX3,  0 

add 

A4,  SAVE^AX4,  0 

add 

A5 ,  SAVE ^4X5,  0 

add 

A6 ,  SAVE^AXS,  0 

add 

A7 ,  SAVE_AX7 ,  0 
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@1 : 


add  A8,  SAVE_AX8  ,  0 

write_Nreg  (SAVEJN) 
st_32  r26 ,  CONST^PTR, 

st_32  r 9,  CONST_PTR ,  $ 

call  @lf$w 


$  s  ave_r 10_o  f  f set 
save_r9_of fset 


Nop 

ld_32 

ld_32 


add 

jump_reg 

Nop 


rlO,  CONST_RTR,  $save_rlO_of fset 
r 9,  CONST  J’TR,  $save_r9_of fset 
T1 ,  OLD_BP ,  0 
T1 ,  $0 


doabort : 

jump  abort$w 

Nop 


#define  return_val (value)  \ 

jump_reg  T3,  $4;  \ 

add  T4,  r0(  $value 


tdefine  push  (reg)  \ 
st_40 
add 


reg,  T5,  0;  \ 
T5,  T5,  $8 


#define  pop  (reg)  \ 

sub  T5,  T5(  $8;  \ 

ld_40  reg,  T5,  0 


y* ****************************  uni fy 


/* 

*  T1 

*  T2 

*  T3 
*  T4 

*  T5 

*  T6,  T7 

*  T8,  T9 

* 

V 


First  argument 
Second  argument 
Return  address 

Return  value  of  unify  and  temporary  until  return 

Stack  pointer  for  recursive  unifys 

Temporaries 

Cannot  use  here  (needed  by  callers  for  temporaries  that 
exist  across  calls.) 


unify : 

ld_32 


T5,  C0NST_RTR,  $stack_of fset 


unify_rest : 

rd_tag 

and 

rd_tag 

and 


T6 ,  T1 

T6 ,  T6 ,  $type_jnask 
T7 ,  T2 

T7,  T7,  $type_mask 
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cmp  _br  _de  1  ayed 
Nop 

cmp_br _de 1 ayed 
Nop 

cmpJor_delayed 

Nop 

rd_tag 

or 

rd_tag 

or 

crnpjor  _de  1  ayed 
Nop 

cmp_br_delayed 

Nop 

return_val (1) 


eq,  T6,  $var_type,  dobind 
eq,  T7,  $var_type,  dobind 
neq,  T6,  $const_type,  not_const 
T6,  T1 

T6 ,  T6 ,  $cdr_type 
T7,  T2 

T7 ,  T7,  $cdr_type 
neq,  T6,  T7y  failed 

neq,  Tl,  T2,  failed 


failed:  return_val (0) 


not_const : 


contl : 


cont2 


cmp_br _de 1 ayed 
Nop 

push (Tl) 
push (T2) 
push  (T3) 

neq. 

T6 

,  T7 ,  failed 

ld_40 

Tl, 

Tl, 

0 

ld_40 

T2 , 

T2 , 

0 

jump 

unify_rest:$w 

rd  special 
pop  (T3) 
pop (T2) 
pop (Tl) 

T3, 

cpu_pc 

cmp  Jor_delayed 
Nop 

return_val  (0) 

eq. 

T4, 

$1,  contl 

add 

T4, 

Tl, 

$8 

ld_40 

Tl, 

T4, 

0 

rd_tag 

T6 , 

Tl 

and 

T6 , 

T6 , 

$cdr_type 

cmp_br _de 1 ayed 
Nop 

eq. 

T6, 

$cdr_type,  cont2 

add 

Tl, 

T4, 

0 

wr_tag 

Tl, 

$list_type 

add 

T4, 

T2 , 

$8 

ld_40 

T2 , 

T4, 

0 

rd_tag 

T6 , 

T2 

and 

T6 , 

T6 , 

$cdr_type 

cmp  _br_delayed 

eq. 

T6 , 

$cdr_type,  cont3 
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cont3 : 


cont4 : 


Nop 

add 

wr_tag 
push  (T3) 
jump 

rd_special 
pop  (T3) 

cmp_br_de  1  ayed 
return_val  (0) 
return_val (1) 


T2,  T4,  0 
T2,  $list_type 

unify_rest$w 
T3,  cpu_pc 

eq,  T4,  $1,  cont4 


dobind:  cmp_Jor_delayed  neq,  T6,  $var_type,  one.var 
Nop 

cmpJor_delayed  neq,  T7,  $var_type,  one_var 
Nop 

cmp_jor_delayed  ge,  Tl,  T2 ,  bindl 
bind  (T1 ,  T2,  T4) 
return_val (1) 
bindl:  bind(T2,  Tl,  T4) 
return_val (1) 

one_var :  cmp_h>r_delayed  neq,  T6,  $var_type,  bind2 
bind (T2 ,  Tl,  T4) 
return_val (1) 
bind2 :  bind(Tl,  T2,  T4) 

return_val (1) 


yf*  ************* 


/* 

*  Tl 

*  T2 

*  T3 

*  T4 

*  T5 

*  T6,  T7 

*  T8,  T9 

■k 

*  CP 


***************  esc_unify  *********************/ 


First  argument 
Second  argument 
Return  address 

Return  value  of  unify  and  temporary  until  return 
Stack  pointer  for  recursive  unifys  (PDL) 

Temporaries 

Cannot  use  here  (needed  by  callers  for  temporaries  that 
exist  across  calls.) 

Used  as  the  PDL  here.  It  is  saved  first  however. 


V 


#define  PDL  CP 

#define  first_call  0x0 

#define  not_f irst_call  Oxl 


#define  esc_return_val (value)  \ 

tag_cmpj3r_delayed  eq_tag,  PDL,  $not_first_call ,  @100 f;  \ 

Nop;  \ 
pop (CP) ;  \ 
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@100:  jump_reg  T3,  $4;  \ 

add  T4,  rO,  $value 

esc_unify : 

ld_32  T5,  CONST.J’TR,  $stack_of fset 

push  (CP) 

ld_32  PDL,  CONST_PTR,  $PDL„offset 

wr_tag  PDL,  $first_call 

esc_uni fy_rest : 

rd_tag  T6 ,  T1 

and  T6 ,  T6,  $type_mask 

rd_tag  T7,  T2 

and  T7 ,  T7,  $type_mask 

cmp_br _de 1 ayed  eq,  T6 ,  $var_type,  esc_dobind 

N°p 

cmp_br_delayed  eq,  T7,  $var_type,  esc_dobind 

Nop 

cmp_br _de 1 ayed  neq,  T6 ,  $const_type,  esc_not_const 
Nop 

cmp_br _de 1 ayed  ne_40,  Tl,  T2,  esc_f ailed 
Nop 

esc_return_val (1) 

esc_failed : 

esc__return_val  (0) 

esc_not_const : 

cmp Jor _de 1 ayed  neq,  T6,  T7,  esc_f ailed 
Nop 

push (Tl) 
push (T2) 
push  (T3) 
push  (PDL) 

wr_tag  PDL,  $not_first_call 

ld_40  Tl,  Tl,  0 

ld_40  T2 ,  T2 ,  0 

jump  esc_unify_rest$v 

rd_special  T3,  cpu_pc 

pop  (PDL) 

pop  (T3) 

pop (T2) 

pop (Tl) 

cmp_Jor_delayed  eq,  T4,  $1,  esc_contl 
Nop 

esc_return_val (0) 

esc_contl : 

add 


T4,  Tl,  $1 
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ld_40 

rd_tag 

and 

cmp_hr_delayed 

Nop 

add 

vr_tag 


Tl ,  T4,  0 
16,  T1 

16,  16 ,  $cdr_type 

eq,  16,  $cdr_type,  esc_cont2 

Tl,  T4,  0 
Tl,  $list_type 


esc_cont2 : 

add 
ld_40 
rd_tag 
and 

cmp_br_de  1  ayed 
Nop 
add 

wr_tag 
esc_cont3 : 

push  (T3) 
push  (PDL) 
wr_tag 
jump 

rd_special 
pop  (PDL) 
pop  (T3) 

cmp_br _de  1  ayed 
esc_return_val (O) 
esc_cont4 : 

esc_return_val (1) 


T4, 

12, 

16, 

16, 

eq, 

12, 

12; 


12, 

T4, 

12 

16, 

16, 


$1 

0 

$cdr_type 

$cdr_type„  esc_cont:3 


T4,  0 
$list_type 


PDL,  $not_first_call 
esc_unify_rest$w 
T3 ,  cpu_pc 


eq, 


T4,  $1,  esc_cont4 


esc_dobind : 

cmp_)or_de  1  ayed 
Nop 

cmp_br _de 1 ayed 
Nop 

cmp_br  _de 1 ayed 

Nop 

jump 

Nop 

esc_bindl : 

add 

add 

jump 

add 

esc_one_var : 

cmp_br _de 1 ayed 

add 

add 

add 


neq,  T6,  $var_type,  esc_one_var 
neq,  T7 ,  $var_type,  esc_one_var 
ge,  Tl,  T2 ,  escjoindl 
do_escjoind$w 


T4,  Tl,  0 
Tl,  T2 ,  0 
do_esc_bind$v 
T2 ,  T4,  0 

neq,  16,  $var_type,  do_escJbind 
T4,  Tl,  0 
Tl,  T2 ,  0 
T2 ,  T4,  0 
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#define  binding  T1 
#define  bound  T2 


do_escJoind : 


rd_tag 

T4,  binding 

and 

T4,  T4,  $type_mask 

cmp_br _de 1 ayed 
Nop 

eq,  T4,  $var_type,  @lf 

cmp_br_de 1 ayed 
Nop 

neq,  T4,  $const_type,  else2 

@1: 

rd_tag 

T4,  bound 

and 

T4,  T4,  $cdr_type 

cmp  Jor  _de  1  ayed 

neq,  T4,  $cdr_type,  elsel 

rd_tag 

T4,  binding 

or 

T4,  T4(  $cdr_type 

wr_tag 

binding,  T4 

elsel : 

jump 

endifl$w 

st_40 

binding,  bound,  0 

else2 : 
i  f  3 : 

rd_tag 

T4,  bound 

and 

T4,  T4,  $cdr_type 

cmp  Jor_delayed 

neq,  T4,  $cdr_type,  else3 

add 

T4,  H,  0 

wr_tag 

T4,  $cdr_type 

jump 

endif 3$w 

st_40 

T4,  bound,  0 

else3 : 

rd_tag 

T6,  binding 

and 

T6,  T6,  $type_mask 

wr_tag 

T4,  T6 

st_40 

T4,  bound,  0 

endi f 3 : 
fori: 

add 

S,  binding,  0 

wr_tag 

S,  0 

for2  : 

ld_40 

T4,  CONST_PTR,  $nil_offset 

cmp_br _de 1 ayed 

eq_40,  S,  T4,  endfor2 

if4: 

ld_40 

T4,  S,  0 

rd_tag 

T6 ,  T4 

and 

T6,  T6,  $type_mask 

cmp_Jor  _de  1  ayed 
Nop 

neq,  T6,  $var_type,  else4 

add 

T6 ,  H,  0 
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wr_tag 

T6 ,  $var_type 

st_40 

T6 ,  H,  0 

st_40 

T6 ,  T4,  0 

jump 

endif4$w 

add 

H,  H,  $8 

else4 : 
if5 : 

cmp  Jor  _de  1  ay  ed 
Nop 

neq,  T6,  $const_type,  else5 

st_40 

T4,  H,  0 

•jump 

endif5$w 

add 

H,  H,  $8 

else5 : 

st_40 

S,  PDL,  $-8 

st_40 

H,  PDL,  $-16 

sub 

PDL,  PDL,  $16 

add 

T7 ,  rO,  $-1 

wr_tag 

T7 ,  T6 

st_40 

T7 ,  H,  0 

add 

H,  H,  $8 

endi f 5 : 
endif4 : 

add 

S,  S,  $8 

ld_40 

T4,  S,  0 

if6  : 

rd_tag 

T6 ,  T4 

and 

T4,  T4,  $cdr_type 

cmp  _br_delayed 

neq,  T4,  $cdr_type,  endi f 6 

if7: 

ld_40 

T6 ,  CONST _PTR,  $nil_offset 

cmp  Jar  _de  1  ayed 
Nop 

eq,  T4,  T6 ,  @lf 

cmp_br  _de 1 ayed 
Nop 

neq,  T4,  S,  else7 

@1: 

add 

S,  T6 ,  0 

if8 : 

cmp  Jor  _de  1  ayed 
Nop 

neq,  T4,  T6,  else8 

st_40 

T4,  H,  0 

jump 

endi f8$w 

add 

H,  H,  $8 

else8 : 

add 

T7 ,  H,  0 

wr_tag 

T7,  $cdr_type 

st_40 

T7 ,  H,  0 

add 

H,  H,  $8 

endi f 8 : 

jump 

endif7$w 
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Nop 

e.lse7 : 

add 

wr_tag 

endif7 : 
endif6 : 

jump 

Nop 

endfor2 : 

ld_32 

cmp_Jor  _de  1  ay  ed 

ld_40 

ld_40 

add 

ld_40 

ld_40 

rd_tag 

and 

add 

wr_tag 

jump 

st_40 


S,  T4,  0 
S,  0 


for2$w 


T6 ,  CONST _J?TR,  $PDL_offset 

ge,  T6 ,  PDL,  end fori 

T4,  PDL,  0 

binding,  PDL,  $8 

PDL,  PDL,  $16 

binding,  binding,  0 

T6 ,  T4,  0 

T6 ,  T6 

T6,  T6,  $type_mask 

T7,  H,  0 

T7 ,  T6 

forl$w 

T7 ,  T4,  0 


end fori : 
endifl : 

esc_return_val (1) 


#undef  bound 
tundef  binding 

/*******************  abort  ***********************/ 

abort:  add  r28,  rO,  0 

jump  0 

Nop 

******************  success  ********************** 


success : add  r28,  rO,  $1 

jump  0 

Nop 


*********************  unify_const_func  ********** 

/* 

*  T8  constant  to  unify 

*  T9  return  address 

V 


unify_const_func : 
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tag_cmpj3r_delayed  eq_tag,  S,  $write_mode,  @4f 

Nop 

ld_40  Tl,  S,  0 

decdr  (Tl) 

rd_tag  T3,  Tl 

and  T4,  T3,  $cdr_type 

cmpJor_delayed  neq,  T4,  $cdr_type,  @2f 

tag_cmpJor_delayed  ne_tag,  Tl,  $unbound_var_type,  @3f 


@1 :  add 

wr_tag 
call_unify  () 
jump 
wr_tag 

@2:  deref(Tl,  T3) 

add 

call_unify  () 
cmp_Jor  _de  1  ay  ed 
Nop 

@3:  jump 

Nop 

@4 :  add 

st_40 

add 

(5)5 :  jump_reg 

Nop 


T2,  H,  0 
T2,  $list_cdr_type 

@4f$w 

S,  $write_mode 

T2 ,  T8 ,  0 

eq,  T4,  $1,  §5£ 

fail$w 

Tl ,  T8,  0 
Tl,  H,  0 
H,  H,  $8 
T9,  $4 


*****************************  unify_value  *************************/ 


/* 

*  T8  value  to  unify 

*  T9  return  address 

V 


unify_value : 

deref (T8,  Tl) 

tag_cmp_Jor  _de  1  ay ed  eq_tag,  S,  $write_mode,  @4f 

Nop 

ld_40  T2 ,  S,  0 


decdr (T2) 

rd_tag  T3,  T2 

and  T4 ,  T3,  $cdr_type 

cmpJor_delayed  neq,  T4,  $cdr_type,  @2f 
Nop 

tag_cmp_Jor_delayed  ne_tag,  Tl,  $unbound_var_type, 

@1 :  add  Tl,  T2,  0 

add  T2,  H,  0 

wr_tag  T2 ,  $ list_cdr_type 

call_unify  () 


(§3  f 
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jump 

04f$w 

wr_tag 

S,  $write_mode 

@2: 

deref(T2,  T3) 

add 

call_unify  () 

Tl,  T8,  0 

cmp  Jor  _de  1  ayed 
Nop 

eq,  T4,  $1,  05 

03: 

jump 

Nop 

fail$w 

@4: 

add 

Tl,  T8,  0 

add 

T2 ,  H,  0 

wr_tag 

T2 ,  $var_type 

st_40 

T2 ,  H,  0 

add 

call_unify  () 

H,  H,  $8 

05: 

jump_reg 

Nop 

T9,  $4 

****************************  is_2  *********************/ 

/* 

*  T9  Return  address 

V 


#define  Temp  T3 
#define  var  T2 

#define  var_tag  Temp 
#define  val  T1 
tdefine  val_tag  T4 
#define  op  T6 

#define  nl  T7 

#define  n2  T8 

#define  ch  T6 


is_2  : 

ld_32 


T5,  CONST  _PTR,  $stack_of fset 


var,  Al,  0 
0 


is_2_rest : 

push  (Al) 
push  (A2) 
add 

deref(var,  var_tag) 
add  val,  A2 

deref(val,  val_tag) 

and  var_tag,  var_tag,  $type_mask 

cmp_hr _de 1 ayed  eq,  var_tag,  $var_type,  01  f 
Nop 

cmpJor_delayed  eq,  var_tag,  $const_type,  @lf 
Nop 
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@1: 


(3)2  : 


(5)4 : 


call_fail  () 
and 

cmpjbr _de  1  ay  ed 
Nop 

cmp  Jor  _de  1  ayed 
Nop 

call_fail  () 


val_tag,  val_tag,  $type_mask 
eq,  val_tag,  $const_type,  (5)9f 

eq,  val_tag,  $struct_type,  (3)2  f 


add 

ld_40 

add 

ld_40 

ld_40 

add 


S,  val,  0 
ch,  S ,  0 
S,  S,  $8 
nl,  S,  0 
n2 ,  S,  $8 
S,  S,  $16 


t  ag_cmp_jor  _de  1  ayed 
Nop 


ne_tag,  nl. 


add 
wr_tag 
st_40 
add 

push  (ch) 
push  (n2) 
push (var) 
push (T9) 
jump 

rd_special 
pop (T9) 
pop  (var) 
pop  (n2) 
pop (ch) 
add 

deref  (nl , 


Al, 
Al, 
Al, 
A2 , 


H,  0 

$var_type 
H,  0 
nl ,  0 


Temp) 


is_2_rest$w 
T9 ,  cpu_pc 


nl,  Al,  0 


$struct_type,  (S)4f 


tag_cmp_)or_delayed  ne_tag,  n2. 

Nop 


add 

Al, 

H,  0 

wr_tag 

Al, 

$var  type 

st_40 

Al, 

H,  0 

add 

A2, 

n2 ,  0 

push (ch) 
push  (nl) 
push (var) 
push (T9) 

jump  is_2_rest$w 

rd_special  T9,  cpu_pc 

pop  (T9) 
pop  (var) 
pop  (nl) 


$struct_type,  @4f 
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(S19  : 


pop  (ch) 
add 

deref(n2.  Temp) 


n2,  Al,  0 


and  Temp,  Temp,  $const_type 

cmp_Jbr  _de 1 ay ed  eq.  Temp,  $const_type,  @5f 


rd_tag  Temp,  nl 

and  Temp,  Temp,  $const_type 

cmp Jar _de 1 ay ed  eq.  Temp,  $const_type,  @5f 
Nop 

call_fail  () 


push  (A2) 
push  (A3) 
push  (A4) 
add 
add 
add 

escape  (ARITH, 
pop  (A4) 
pop  (A3) 
pop  (A2) 
wr__tag 
jump 

rd_special 
pop  (A2) 
pop  (Al) 


/*  arg  1  */ 

/*  operator  */ 
/*  arg  2  */ 

A2,  nl  ,  0 
A3,  ch  ,  0 
A4,  n2  ,  0 
Tl) 


Tl,  $const_num_type 
unify_rest$w 

T3 ,  cpu_pc 


jump_reg  T9,  $4 

Nop 
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Appendix  3:  Macro-Expansion  of  PLM  Instructions  to 

SPUR/Coprocessor 

The  Prolog  coprocessor  instructions  are  broken  up  into  six  groups: 

•  Data  Transfer:  LD,  ST,  TO,  FROM,  MOVE 

•  State  Saving  and  Modifying: 

PUSH.  CHOICEPT,  POP.  CHOICEPT,  PUSH.ENV,  POP.ENV,  SET.  MODE 

•  Compare  and  Branch: 

TAG.CMP.BR.  DELAYED,  CMP.  BR.  DELAYED 

•  Unify:  UNIFY.  X.  BR.  DELAYED ,  UNIFY.  Y.  BR.  DELAYED 

•  Heap  and  Trail:  MAKE.  VAR,  PUSH.  ONTO.  HEAP,  UNDO.  TRAIL 

•  Special:  HASH 

The  macro-expansion  of  PLM  instructions  into  a  combination  of  SPUR  and  Prolog 
coprocessor  instructions  is  given  below.  Note  that  not  all  PLM  instructions  use 
the  coprocessor,  many  instructions  can  be  implemented  directly  in  SPUR  code. 
The  PLM  instructions  are  in  boldface  and  their  corresponding  SPUR  code  is 
immediately  below.  Although  this  code  is  by  no  means  debugged  or  complete,  we 
feel  that  it  provides  enough  data  to  give  a  reasonable  estimate  of  expected  perfor¬ 
mance  and  code  size.  These  instruction  sequences  were  used  to  generate  the  data 
in  Tables  15  and  16. 

switch,  on.  term  Lc,Ll,La 

TAG.  CMP.  BR.  DELAYED  const,  Xi  ,-,Lc 
TAG"  CMP.  BR.  DELAYED  list,Xi,-,Ll 
TAG.  CMP.  BR.  DELAYED  struct,  Xi,-,  Ls 
NOP 


switch,  on.  constant  N,T 

LD  GRj,address(T) 

NOP 

HASH  Xi,GRj,GRk,N 

CMP.  BR.  DELAYED  failedHash,-,-,fail 
FROM  Ri,GRk 

NOP 

JUMP.  REG  Ri 

NOP 


switch,  on.  structure  N,T 

LD  G  Rj,  address(  T) 

NOP 

HASH  Xi,GRj,GRk,N 

CMP.  BR. DELAYED  failedHash,-,-,fail 
FROM  Ri.GRk 

NOP 

JUMP.  REG  Ri 

NOP 
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try 

L 

RD.  SPECIAL 

Ri,PC 

TO 

NOP 

PUSH.  CHOICEPT 

PRi 

JUMP 

NOP 

L 

retry 

L 

RD.  SPECIAL 

Ri,PC,0 

FROM 

NOP 

Rj,B 

ST 

Ri,Rj-Poffset 

JUMP 

L 

SET.  MODE 

cut,  1 

trust 

L 

POP.CHOICEPT 

trust 

JUMP 

NOP 

L 

try.  me.  else 

L 

LD 

NOP 

PUSH.  CHOICEPT 

P,address(L) 

retry,  me.  else 

L 

LD 

Ri,address(L) 

FROM 

NOP 

Rj,B 

ST 

RiRj-Poffset 

SET.  MODE 

cut,l 

trust,  me.  else 

fail 

POP.CHOICEPT 

trust 

UNDO. TRAIL 

POP.CHOICEPT  fail 

FROM  Ri,P 

NOP 

JUMP.  REG  Ri 

NOP 


cut 

POP.CHOICEPT  cut 

cutd  L 
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LD  GRi,address(L) 

NOP 

POP.  CHOICEPT  cutd,GRi 


proceed 

FROM 
MOVE 
JUMP.  REG 
SET.  MODE 


CP,  Ri 
CP,  P 

Ri 

cut,0 


execute 

JUMP 

SET.  MODE 


Proc 


Proc 

cut,0 


RD.  SPECIAL 

TO 

JUMP 

SET.  MODE 


n,  Proc 

Ri,PC,0 


CP,Ri 

Proc 

cut,0 


allocate 

PUSH.ENV  N 


deallocate 

POP.ENV 

get.  nil 

UNIFY.  X.BR.  DELAYED 
SET.  MODE 


N 

Ai 

const  I  get, Xi, NIL, fail 
unify, read 


get.  constant  c,Ai 

LD 
NOP 

UNIFY.  X.  BR.  DELAYED 
SET.  MODE 


GRi,address(c) 

const  1  get, Xi,GRi, fail 
unify, read 


get.  variable 

MOVE 


or 

MOVE 


[AXI  Y]n,Ai 

XX, Xi,Xn 

XY, Xi,Yn 


get.  list  Ai 

UNIFY.  X.  BR.  DELAYED  list  |  get, Xi,S, fail 

NOP 


get.  structure 

UNIFY.  X.  BR.  DELAYED 
LD 


F,Ai 

struct  1  get, Xi,S, fail 
GRi, address!  F) 
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NOP 

UNIFY. X.  BR.  DELAYED  const  I  unify, GRi,S, fail 
NOP 


get.value  [AXlYjn^Ai 

UNIFY.  X.BR.  DELAYED  val  I  get, Xn,Xi, fail 
CMP. BR. DELAYED  moreToUnify,- ,-,-1 
SET.  MODE  unify, read 

MOVE  XX,Xi,Xn 

or 

UNIFY.  Y.BR.  DELAYED  val  I  get, Xi,Yn, fail 

CMP.  BR.  DELAYED  moreToUnify, -,-,-1 
SET  ."MODE 

put.  nil 

MOVE 

put.  constant 

LD 


unify, read 
Ai 

XX,NIL,Xi 

cA-i 

Xi,address(c) 


put.  variable 

MAKE.  VAR 

MOVE 

or 

MAKE.  VAR 

put.  list 

MAKE.  VAR 
SET. MODE 

put.  structure 

LD 

MAKE.  VAR 
SET.  MODE 
PUSH.  ONTO. 


[AX|Y]nAi 

var  I  heap,-,Xi 
XX,Xi,Xn 

var  I  env,Yn,Xi 

Ai 

list  I  heap,-,Xi 
unify, write 


FAi 

GRi,address(F) 
struct  |  heap,-,Xi 
unify, write 

HEAP  GRi 


put.  value 

MOVE 

or 

MOVE 

put.  unsafe,  value 

MAKE  VAR 


[AX|Y]nAi 

XX,Xn,Xi 

YX,Yn,Xi 

Yn  Ai 


safe,Yn,Xi 


unify,  void 


use  unify,  variable  n  times 


unify,  value 


[AX1  Y]n 
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UNIFY. X.  BR.  DELAYED  val  I  unify  I  incrS,Xn,S,faii 

CMP.  BR.  DELAYED  moreToUnify,-,-,-l 
NOP 
or 

UNIFY.  Y.BR.  DELAYED  val  I  unify  I  incrS,Yn,S,fail 

CMP.  BR.  DELAYED  moreToUnify -1 

NOP* 

unify,  variable  [AXlY]n 

UNIFY. X.BR.  DELAYED  var !  unify  I  incrS, NIL, S, fail 

MOVE  XX,Ul,Xn 

or 

UNIFY.  X.  BR.  DELAYED  var  I  unify  I  incrS,NIL,S,fail 

MOVE  XY,Ul,Yn 

unify,  constant  c 

LD  GRi,address(c) 

NOP 

UNIFY. X.  BR.  DELAYED  const  I  unify  1  incrS, GRi,S, fail 

NOP 

unify.cdr  [AX|Y]n 

UNIFY. X.BR. DELAYED  cdr  |  unify, NIL, S, fail 

MOVE  XX,Ul,Xn 

or 

UNIFY. X.BR. DELAYED  cdr  I  unify, NIL, S, fail 
MOVE  XY,Ul,Yn 

unify,  nil 

UNIFY.  X.  BR.  DELAYED  const  I  unify, NIL, S, fail 

NOP 
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Appendix  4:  Microcode  for  a  SPUR  Prolog  Coprocessor 

This  appendix  provides  an  outline  for  the  microcode  requirements  of  each 
coprocessor  instruction.  The  instruction  name  is  in  boldface  along  with  a  descrip¬ 
tion  of  the  fields  in  the  instruction  and  their  size.  Immediately  below  each 
instruction  is  a  description  of  what  operations  must  be  performed  in  each  of  the 
SPUR  pipeline  stages.  The  instruction  fetch  cycle  is  not  represented  as  it  is  ident¬ 
ical  for  all  the  instructions.  There  is  a  fifth  pipeline  stage  added,  the  extended 
processing  stage,  for  providing  the  coprocessor  with  the  extra  execution  time  it 
may  require.  The  number  next  to  the  heading  for  the  extended  processing  stage 
signifies  the  number  of  extra  cycles  required.  This  special  stage  occurs  between 
the  second  and  third  stages  of  the  SPUR  pipeline. 

TAG.  CMP.  BR_  DELAYED  mask(5),reg(5),tag(5),off8et(9) 

R:  read  reg,  mask  tag  and  cmp  if  no  need  to  deref 

E:2  deref  (mem  access;  test  for  more  deref  and  update  reg), 

mask  tag  and  cmp 
M: 

W:  write  dereferenced  value  back  into  reg 

CMP.BR.  DELAYED  maak(5),reg(5),tag(5),ofiFset(9) 

R:  read  reg  or  mask  tag  and  cmp 

M: 

W: 

LD  reg(5),reg(5),reg(5),offset(9) 

R:  read  regs  and  calculate  source  address 

M:  mem  access 

W:  write  reg 

HASH  reg(5),reg(5),reg(5),immediate(9) 

R:  read  regs 

E:3+2i  deref;  read  starting  addr;  linear  search  through  immediate 
number  of  entries 
M:  read  address  to  jump  to 

W:  write  address  into  reg 

TO  reg(5),reg(5) 

R:  read  reg 

M: 

W:  write  reg 

FROM  reg(5),reg(5) 

R:  read  reg 

M: 

W:  write  reg 

ST  reg(5),reg(5),immediate(14) 
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R:  read  rep  and  calculate  destination  address 

M:  mem  access 

W: 

PUSH.  CHOICEPT 

R:  read  B  reg 

E:16  store  choice  point  (15  rep);  update  HB 

M: 

W:  write  new  B 

POP.  CHOICEPT  type(2) 

trust 

R:  read  B  reg,  calculate  address  of  H 

E:2  read  new  HB  reg,  calculate  address  of  B;  write  HB, 
read  new  B  reg 
M: 

W:  write  B 

cut/cutd 

R:  read  B  reg,  read  E  ref,  calculate  address  of  H 

E:3  read  new  HB,  calculate  address  of  B;  write  HB, 

read  new  B;  calculate  address  of  H  if  loop 
M: 

W:  write  last  B 

fail 

R:  read  B 

E:12  read  new  rep  (12  rep)  and  update  11 
M: 

W:  write  new  B 

SET.MODE  bit(l),immediate(l) 

R:  execute  set  or  reset  for  mode  or  cut  bit 

M: 

W: 


PUSH.ENV  immediate(9) 

R:  read  E 

E:3  store  environment  (3  rep) 

M:  store  4th  reg  of  environment 

W:  write  new  E 


POP.ENV 

R: 

E:3 

M: 

W: 


immediate(9) 

read  E  reg 

read  environment  regs  (3  rep) 
write  new  E 


MOVE 

XX 


X/Y(l)pC/Y(l),reg(5),reg(5) 
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R:  read  reg 

M: 

W:  write  reg 

XY 

R:  read  reg,  read  E 

M:  write  to  Y 

W: 

YX 

R:  read  E 

M:  read  Y 

W:  write  reg 

PUSH.  ONTO.  HEAP  reg(5) 

R:  read  reg  and  H 

M:  write  to  heap 

W:  update  H 

MAKEVAR  type(3),reg(5),reg(5) 

var I  heap 
R:  read  H 

M:  write  var 

W:  update  H 

var I env 
R:  read  E 

M:  write  var 

W:  write  into  reg 

1st  I  heap.str  I  heap 
R:  read  H 

M: 

W:  write  into  reg 

safe 

R:  read  E 

E:5  read  Y,  read  H;  deref;  write;  update  H 

M: 

W:  write  reg 

UNDO.  TRAIL 

R:  read  B  and  TR 

E:3l  read  last  TR;  read  first  trail  entry,  decrement  TR; 

write  to  unbind,  loop  to  read  next  trail  entry  if  more 
M: 

W:  write  new  TR 

UNIFY.  X.BR.  DELAYED  type(3), get/unify  (l),incrS(l),reg(5),reg(5),offset(9) 

Write: 
const/get 
R:  read  regs 

E:4  deref;  unify  (cmp;  write  binding;  write  to  trail;  update  TR) 
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M:  last  write  of  unify 

W:  update  TR 

lst/get,str/get 
R:  read  regs 

E:4  deref,  read  H;  unify 
M:  last  write  of  unify 

W:  update  TR 

val/get  / '  moreToUnify 
R:  read  rep 

E:16  deref;  deref;  push  onto  PDL  if  1st  or  str,  follow  pointer; 

follow  other  pointer;  unify; 

update  Ul  and  U2  (incr  pointer  and  decdr  or  from 
PDL  if  end  of  1st  or  str) 

M: 

W: 

val  /  get/  moretounify 
R:  read  Ul  and  U2 

E:16  push  onto  PDL  if  1st  or  str,  follow  pointer;  follow  other 
pointer;  deref;  deref;  unify; 

update  Ul  and  U2  (incr  pointer  and  decdr  or  from 
PDL  if  end  of  1st  or  str) 

M: 

W: 

Read  Mode: 

const/unify 
R:  read  S 

E:3  get  item  pointed  to  by  S;  unify 

M:  last  write  of  unify 

W:  update  TR 

const/unify/incrS 
R:  read  S 

E:9  next  pointed  to  by  S;  decdr,  read  H;  unify  {;  write  to  heap; 
update  H} 

M: 

W:  update  S 

cdr/unify 
R:  read  S 

M:  get  item  pointed  to  by  S 

W:  write  to  dest  reg 

var/unify/incrS 
R:  read  S 

E:9  next  pointed  to  by  S;  decdr,  read  H;  unify  {;  write  to  heap; 

update  H} 

M: 

W:  update  S 

val/unify/incrS/'moreToUnify 
R:  read  S  and  reg 

E:17  next  pointed  to  by  S;  deref;  deref;  push  onto  PDL  if  1st  or  str, 
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follow  pointer;  follow  other  pointer,  read  H;  unify;  {write  to 
heap;  update  H;}  update  U1  and  U2  (incr  pointer  and  decdr  or 
from  PDL  if  end  of  1st  or  str) 

M: 

W:  update  S 

val/unify/incrS/moreToUnify 
R:  read  U1  and  U2 

E:16  push  onto  PDL  if  1st  or  str,  follow  pointer;  follow  other  pointer; 
deref;  deref;  unify; 

update  U1  and  U2  (incr  pointer  and  decdr  or  from 
PDL  if  end  of  1st  or  str) 

M: 

W: 

Write  Mode: 

const/unify, const/unify/incrS 

R:  read  reg 

M:  write  to  heap 

W:  update  H 

cdr/unify,var/unify/incrS 

R:  read  H 

E:1  write  to  reg 

M:  write  to  heap 

W:  update  H 

val/unify/incrS 

R:  read  H  and  reg 

E:5  write  to  heap,  unify 

M: 

W:  update  H 

UNIFY.  Y.BR.  DELAYED  type(3),  get/unify  (l),incrS(l),reS(5),reg(5),offset(9) 

same  as  UNIFY. X.  BR.  DELAYED  with  one  extra  cycle  for  access 
to  memory  and  read  E  reg  (an  extra  cycle  if  it  can’t  be  done  in  parallel). 
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